1. Field of the Present Invention
The present invention generally relates to data processing systems, and more specifically, to methods and apparatuses residing in such systems that prevent the occurrence of deadlock from the execution of unresolvable system bus operations from producing deadlock.
2. History of Related Art
The evolution of the computer industry has been driven by the insatiable appetite of the consumer for ever increased speed and functionality. One species which has evolved from the above is the multi-processor computer.
Multi-processor systems, in similarity to other types of computer systems, have many different areas that are ripe for improvements. One such area is the processing of variable delay system bus operations.
Modern multi-processor systems typically include a number of processing elements, and a main memory, each of which are connected by a series of buses that ultimately terminate in a common system bus. The processing elements usually include a processor having a pre-determined amount of on-board cache and, in some cases, a cache hierarchy. The cache hierarchy, typically, includes a number of caches (e.g. level 0-2) which are interposed between the processor and the common system bus.
In general, operations, in such multi-processor systems, are performed by the processor, residing at the top of a cache hierarchy, placing an operation on the bus between the processor and the first off-board cache. The first off-board cache then propagates the operation, if necessary, to the next lower level cache, if it exists, which then repeats the propagation down the cache hierarchy, if necessary, until the operation finally arrives at the system bus.
Once the operation has arrived at the system bus, it is then snooped by all the caches monitoring the system bus. After a snooping cache detects an operation, it must determine whether or not the execution of the snooped operation can proceed. A cache may be unable or refuse to accept (execute) a snooped operation for any number of reasons. For example, the resources necessary to execute an operation, such as the state machines to process the snooped operation may be busy with other work and unable to process the snooped operation. In general, most system bus protocols allow any operation to be refused when a bus participant is unable to process the operation. If the snooping cache cannot process the operation, then it will send a "RETRY" signal on the system bus. The RETRY signal informs the initiator of the operation that execution thereof was unsuccessful, and that the operation should be re-tried, if still necessary, at a later point in time.
The amount of time that a participant has in order to make a decision concerning the acceptance of a snooped operation, and to send a snoop response (e.g. "RETRY") is usually fixed for any given system via the bus protocol.
Furthermore, operations in such multiprocessors typically fall into two classes: resolvable and unresolvable. In what follows, an operation is considered resolvable if the lowest level caches can examine the operation and determine, without traversing up their respective cache hierarchies, whether or not the operation must be presented to the cache hierarchy. In other words, if a lowest level cache can examine a system bus operation and determine, in a fixed, usually short period of time, that the operation need not be propagated up the cache hierarchy, the operation is resolvable.
Unresolvable operations are those operations that the lowest level cache cannot determine, in a fixed, usually short, period of time, whether or not the operation must be presented to the cache hierarchy.
For unresolvable operations, the lowest level cache usually must assume that the operation must be propagated up the cache hierarchy to insure the correct operation of the system. In other words, whenever presented with an unresolvable operation, a lowest level cache must assume that this operation is a new operation and must be propagated up the cache hierarchy to insure the correct operation of the system.
In example, the PowerPCTM architecture uses an ICBI instruction to invalidate a block of instructions in the processor instruction caches at the tops of the cache hierarchies in a system. Typically, caches below the processor are unified, that is they hold both instructions and data, and also do not maintain information on which previously loaded blocks have been placed in the instruction cache of the processor. In such a system, the ICBI instruction is unresolvable. When a lowest level cache snoops an ICBI instruction, the lowest level cache has no specific information as to whether the block in question is present in the instruction cache at the top of its' hierarchy. As such, the lowest level cache must assume that the ICBI operation must be propagated up the processor instruction cache at the top of the hierarchy in order to insure proper system operation (if the block is present in the processor instruction cache, it must be invalidated).
Note, however, that the ICBI instruction does not have to be unresolvable, and that the above example is for illustrative purposes and should not be construed in a limiting sense. If the lower level caches (below the processor) maintain information indicating which blocks have been loaded into the processor instruction cache, it is then possible to determine, at the lowest level cache, whether or not any given ICBI must be propagated up any given hierarchy (if the block is not present in the instruction cache of the processor at the top of a given hierarchy, it is unnecessary to propagate the ICBI operation up the given hierarchy) and the instruction is therefore resolvable.
Unfortunately, if the lowest level caches in a system have finite buffering resources to process unresolvable operations, a deadlock scenario can occur.
For example, consider a system with three processors and their respective multi-level cache hierarchies, assuming that each lowest level cache has only one buffer to process unresolvable operations.
To begin, all snoop buffers are idle and an unresolvable operation, herein called X, is placed on the system bus by one participant (i.e. an initiator) and snooped by two other participants (recipients). When this operation, X, is initiated onto the system bus, both recipients snoop the operation, determine that they have a buffer to process the operation, and therefore accept the operation for processing and do not retry the operation. As far as the initiator is concerned, the operation has succeeded and the initiating participant can move on to subsequent operations.
In the snooping participants, however, the snoop buffer that accepted the operation must undertake the task of propagating the operation up its' respective cache hierarchy. This process typically takes a variable amount of time, and therefore it is common for the snoop buffers, that accepted a particular operation, to finish the steps necessary to propagate the operation up the cache hierarchy at different times.
Assume that one of the snooping participants, hereinafter called A, finishes propagating the initial operation, X, up its' respective cache hierarchy, and therefore, has an available snoop buffer.
At this point, the initiator issues another unresolvable bus operation, herein called Y (first attempt). Snooping participant A detects the operation and loads its' snoop buffer to begin processing the operation. Simultaneously, the other snooping participant, hereinafter called B, detects the operation and signals retry due to the fact that it has no buffer available to process the new operation (Y). Snooping participant A has the new operation (Y) in its' buffer for processing and snooping participant B has the original operation (X) in its snoop buffer processing.
At this point, snooping participant B finishes the original operation (X) and therefore snooping participant B's snoop buffer is available. Snooping participant A is processing the second operation (Y). The initiator, in response to the initial retry indication from snooping participant B, re-initiates the second operation (Y) on the system bus (second attempt). Recall that the operation is not complete until it is not retried on the system bus. Snooping participant B will accept the operation and not retry. Snooping participant A, since its' snoop buffer is busy, will retry the operation and once again both participants are processing the second operation (Y).
Snoop participant A is processing the first attempt of the operation and snoop participant B is processing the second attempt. Snoop participant A then finishes processing operation (Y). At this point, the initiator places the operation (Y) on the bus again (third attempt). Snooper A will accept the operation and begin processing it anew. Notice that the operation has already been processed by participant A, but is repeated here because the snooper cannot determine the difference between the initial attempt of the operation and subsequent attempts necessitated by retrys. At this point both participants are processing the operation again. Snooper A is processing the third attempt and snooper B is processing the second attempt.
Further, in the current example, assume that snooper B now finishes processing the operation. At this point, snooper B is available and snooper A is processing the operation. This is the same system state as occurred when the first operation (X) completed after the second operation (Y) was first initiated onto the system bus: snooper A processing operation (Y) and snooper B available. At this point, the sequence of events above can be repeated and this process can repeat indefinitely.
The essential difficulty is due to the fact that snoopers cannot ascertain, for a given operation, whether it is being initiated for the first time or is being re-initiated due to retrys caused by the lack of availability of buffers in other snoopers. In such a situation, the snooper or snoopers that are busy "ping-pong" with the snooper or snoopers that are not busy and the process repeats indefinitely. This type of repetition is known in the industry as a "Ping-Pong deadlock".
Note that, in general, a Ping-Pong deadlock scenario can exist when there are more than two recipients as described above. In the most general case, a Ping-Pong deadlock can occur whenever one or more participants have an available snoop buffer and one or more participants do not have a snoop buffer available when an operation is re-presented by the initiator. The snooper or snoopers that have an available snoop buffer can change dynamically between re-initiations of the operation. This makes a Ping-Pong deadlock more likely to occur in systems with larger numbers of participants.
It is also possible to encounter a Ping-Pong deadlock if the snoopers have a plurality of snooping buffers. In this case it is only necessary to simultaneously have enough unresolvable operations being processed to cause some of the snooping participant's snoop buffers to fill while some others have a snoop buffer available to begin the ping-pong process between available and busy buffers.
Another difficulty caused by a Ping-Pong deadlock is that an operation can be propagated up a given cache hierarchy a number of times in response to one attempt to execute the operation by the processor. A processor could execute an operation that proceeds to an initiator and enters a ping-pong deadlock case. Even if, due to timing, the system "escapes" from the ping-pong deadlock sequence, it is possible that the operation was propagated up one or more of the cache hierarchies a number of times. This can lead to spurious system operation. Certain operations have "side effects" that preclude them from being executed twice. That is to say, an operation with side effects can execute differently depending on the state of the system, which is changed by the operation, and it is therefore essential that an operation with side-effects be presented only once to the targeted processors per execution of that operation by the initiating processor. Any means for correcting ping-pong deadlocks due to unresolvable bus operations must also insure that operations are not repeated at the target processors if the unresolvable operations have side effects.
It would, therefore, be a distinct advantage to have a method and apparatus that would allow execution of unresolvable system bus operations while avoiding a Ping-Pong deadlock and preventing replication of operations. The present invention provides such a method and apparatus.