The present invention relates to processing systems. More particularly, the present invention relates to multiple-node processing systems (e.g., parallel processing systems) in which the nodes require synchronization for events such as communication of processing results.
Massive cache machines (xe2x80x9cMCMsxe2x80x9d) and other parallel processing systems require tightly synchronized nodes to share data synchronously, regardless of the type of compute model used. For example, in tightly-coupled single instruction multiple data (xe2x80x9cSIMDxe2x80x9d) system models, each node runs the same instruction from a common source, but operates on its own data. In the more loosely-coupled multiple instruction multiple data (xe2x80x9cMIMDxe2x80x9d) system model, each node independently runs its own instructions on its own data. With reference to FIG. 1, such computer systems 100 may include parallel CPU processing nodes 101a . . . 101n, commonly coupled by a bus 104 to, for example, an interrupt control unit 103 and/or a storage unit 102.
The xe2x80x9ccomputexe2x80x9d modes discussed above are usually accompanied by xe2x80x9ccommunicatexe2x80x9d modes during which the results of the computations must be communicated between the nodes. However, the nodes may be ready to communicate at different times, out of synchronization. Node synchronization is therefore usually required regardless of the type of compute mode employed.
To effect the required communication, without perfect synchronization, data buffering can be used on storage unit 102. The writing node deposits its data in a buffer, to be read later by the reading node. The problems with this approach are well-known, and include buffer space limitations, buffer deadlock, and extra read/write cycles.
Nodes can also be required to poll memory locations or registers that have values controlling synchronization. Polling, however, requires extra processing cycles, which can become excessive if all nodes are required to continuously poll for synchronization during their otherwise routine processing tasks.
Interrupts can also be used to synchronize nodes. However, interrupting a node from its routine processing creates problems, such as requiring the reading node to perform a context switch from its compute mode to its communicate mode. A context switch occasioned by an interrupt can be costly in terms of lost processing time, and can also interrupt critical processing.
In the previously issued, commonly assigned U.S. Pat. No. 5,778,221 entitled xe2x80x9cSystem for Executing Asynchronous Branch and Link in Parallel Processor,xe2x80x9d (incorporated herein by reference in its entirety) a technique for a lighter-weight context switch is disclosed, triggered by a xe2x80x9cspecial interrupt.xe2x80x9d That patent discusses easing the costs of a context switch, but the problem of issuing the interrupt triggering the context switch remains. Improved synchronization techniques are still required to generate interrupts only when absolutely needed, and to minimize their effects on ongoing processing.
The above problems are addressed, and further advantages are provided, by the present invention which in one aspect is a method and system for synchronizing at least two processing nodes for an event. At least one state counter indicates at least two states including a first, non-impending event state within which nodes polling the state counter continue routine processing; and a second, impending event state within which nodes polling the state counter suspend routine processing to wait for the impending event. A stimulus source shifts the state counter between the at least two states; and an interrupt means is provided for generating an interrupt to synchronize non-waiting nodes for the event.
Nodes may refrain from further polling during the first, non-impending event state; but may poll while waiting during the second, impending event state.
The stimulus source may comprise a timing source, and a timing counter receiving transitions from the timing source, counting to respective, programmable state lengths, and shifting the state counter between its respective states. A respective state counter may be provided in each processing node, stimulated by a stimulus source shared by the processing nodes.
Additional states may be imposed, including a third, interrupt state at the end of which the interrupt is generated; and a fourth, event state within which the event occurs.
The synchronization technique of the present invention is particularly useful for synchronizing parallel processing nodes for the communication of processing results, but can be used for any arbitrary event for which node synchronization is necessary.
By providing the synchronized states of the present invention, which change the operation of the nodes from routine processing to waiting as the event approaches, a balance is struck between harsh, asynchronous interrupts and excessive polling. Here, interrupts are used to interrupt non-waiting nodes only, and nodes which xe2x80x9carrivedxe2x80x9d during the impending event state are spared interruption and wait for the event by, e.g., polling during that time.