1. Technical Field
The present invention generally relates to processing of global operations in multiprocessor systems and in particular to employing tokens to permit speculative execution of global operations within multiprocessor systems. Still more particularly, the present invention relates to implementing a bus snooper employing tokens for speculative execution of global operations within a multi-processor system.
2. Description of the Related Art
Many operations performed within multiprocessor systems may be executed locally by a single processor without immediately involving or affecting other processors within the system. For example, a processor may write a modified cache line to a local cache without making the write operation immediately visible to all other processors. A write-back of the modified data to system memory may be deferred until a later time or combined, through a modified intervention, with a subsequent read operation by a different processor for the same cache line.
However, processors within multiprocessor systems periodically execute operations which must be globally visible to all other processors within the system. By their nature, these operations require the involvement of all other processors. For example, within the PowerPC architecture, a processor may execute an instruction cache clock invalidate (ICBI), translation lookaside buffer invalidate (TLBI), or synchronization (SYNCH) operation. A synchronizing operation, for instance, may be employed to allow prior instructions within an instruction stream executing on a pipelined, out-of-order multiprocessor system to complete before performing a context switch.
Existing designs for multiprocessor systems support global operations by implementing a queue for such operations within each processor for every other processor within the system. That is, a processor within a system havign three other processors will include three queues for snooping global operations. The depth of each snoop queue will equal the latency of the combined response in order to prevent system livelocks. Thus, where a system requires five bus cycles to generate a combined response to an address transaction, the global operation queues will have a pipeline which is five levels deep.
This approach to supporting global operations is extremely hardware intensive and is not scalable. As the operating frequency and the number of processors within a system increases, driving the latency of a combined response up to close to 100 cycles, the approach described above becomes unwieldy. As the window for the combined response becomes larger, snooper implementations become more complex and costly.
It would be desirable, therefore, to to broadcst global operations in a highly scalable multiprocessor system while keeping masters and snoopers as simple as possible but also preventing system livelocks. It would also be desirable to decouple the depth of snoop queues from the width of address to combined response windows, and to maintain high frequency oepration while increasing the number of processor in a system supporting global operations.
It is therefore one object of the present invention to provide improved processing of global operations in multiprocessor systems.
It is another object of the present invention to provide a mechanism for employing tokens to permit speculative execution of global operations within multiprocessor systems.
It is yet another object of the present invention to provide a bus snooper employing tokens for speculative execution of global operations within a multiprocessor system.
The foregoing objects are achieved as is now described. Only a single snooper queue for global operations within a multiprocessor system is implemented within each bus snooper, controlled by a single token allowing completion of one operation. A bus snooper, upon detecting a combined token and operation request, begins speculatively processing the operation if the snooper is not already busy. The snooper then watches for a combined response acknowledging the combined request or a subsequent token request from the same processor, which indicates that the originating processor has been granted the sole token for completing global operations, before completing the operation. When processing an operation from a combined request and detecting an operation request (only) from a different processor, which indicates that another processor has been granted the token, the snooper suspends processing of the current operation and begins processing the new operation. If the snooper is busy when a combined request is received, the snooper retries the operation portion of the combined request and, upon detecting a subsequent operation request (only) for the operation, begins processing the operation at that time if not busy. Snoop logic for large multiprocessor systems is thus simplified, with conflict reduced to situations in which multiple processors are competing for the token.
The above as well as additional objects, features, and advantages of the present invention will become apparent in the following detailed written description.