1. Technical Field
The present invention relates in general to synchronization of processing in multiprocessor systems and in particular to synchronization of bus operations on a multiprocessor system bus. Still more particularly, the present invention relates to an improved method and system for processing a load instruction subsequent to a synchronization instruction.
2. Description of the Related Art
Programmers writing software for execution on multiprocessor data processing systems often need or desire to provide points within the flow of instruction execution serving as processing boundaries, ensuring that all storage accesses within a first code segment are fully executed before any storage accesses within a subsequent code segment are executed. This is particularly true when the multiprocessor system includes superscalar processors supporting out-of-order instruction execution and weak memory consistency. The instruction sets supported by most popular commercial processors include an instruction for setting such a processing boundary. In the PowerPC(trademark) family of processors, for example, the instruction which may be employed by a programmer to establish a processing boundary is the synchronization or xe2x80x9csyncxe2x80x9d instruction. The sync instruction orders the effects of storage access execution. All storage accesses initiated prior to the sync instruction appear to have completed before the sync instruction completes, and no subsequent storage accesses appear to be initiated until the sync instruction completes. Thus, the sync instruction creates a boundary having two significant effects: first, storage accesses which follow the sync instruction within the instruction stream will not be executed until all storage accesses which precede the sync instruction in the instruction stream have completed. Second, storage accesses following a sync instruction within the instruction stream will not be reordered for out-of-order execution with storage accesses preceding the sync instruction.
With respect to the processor initiating a storage instruction which accesses cacheable data, the sync instruction acts as a barrier for storage accesses after the sync instruction which are not executed until the sync instruction completes. Previous processors deal with the synchronization instruction (sync) by stalling on storage accesses. The disadvantage is the sync instruction completion on the bus takes many processor cycles to complete resulting in the storage instructions backing up behind the sync instruction during the wait delaying the time when cacheable data may be processed. Therefore, storage accesses after the sync instruction start when the sync completes on the bus. After the sync completes on the bus, internal cache arbitration further delays the backed up storage operations.
It would be desirable, therefore, to provide a method and system for processing storage accessing of internal cacheable data initiated after the sync instruction and executed during the sync instruction. It would further be advantageous if the method discarded the internal cacheable data for storage accesses after the sync instruction if a snoop operation kills the line before the sync instruction completes thereby allowing the storage access to flush the data and go back out to the bus for new data.
It is therefore one object of the present invention to provide an improved method and system for synchronization of processing in multiprocessor systems.
It is another object of the present invention to provide a method and system for processing storage accesses of internal cacheable data initiated after the synchronization instruction and executed during the synchronization instruction.
It is yet another object of the present invention to provide a method and system for storage accesses to flush internal cacheable data that has been returned to the processor subsequent to a synchronization operation while going back out to the bus for new data if a snoop operation kills the line before the synchronization instruction completes.
The foregoing objects are achieved as is now described. The method and system of the present invention for processing storage accesses within a multiprocessor system subsequent to a synchronization instruction by a local processor consists of determining if data for the storage accesses is a xe2x80x9ccacheable hitxe2x80x9d wherein the storage accesses return the data to the local processor from an internal cache. The storage accesses have an entry on an interrupt table which is used to discard the returned data if a snoop kills the line before the synchronization instruction completes. After the cache returns the data to the processor, a return data bit is set in the interrupt table. A snoop killing the line sets a snooped bit in the interrupt table. Upon completion of the synchronization instruction, any entries in the interrupt table subsequent to the synchronization instruction that have both the return data bit and snooped bit set are flushed. The flush occurs because the data returned to the local processor due to a xe2x80x9ccacheable hitxe2x80x9d subsequent to the synchronization instruction was out of order with the snoop and the processor must flush the data and go back out to the system bus for the new data. If the processor does not flush the data it would use old data thus violating the architecture.
The above as well as additional objects, features, and advantages of the present invention will become apparent in the following detailed written description.