All of the material in this patent application is subject to copyright protection under the copyright laws of the United States and of other countries. As of the first effective filing date of the present application, this material is protected as unpublished material. However, permission to copy this material is hereby granted to the extent that the copyright owner has no objection to the facsimile reproduction by anyone of the patent documentation or patent disclosure, as it appears in the United States Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
Not Applicable
1. Field of the Invention
The present invention relates to a method and apparatus for a distributed parallel processing system in which there is a need for a single Time Of Day (TOD) or incrementor to synchronize (order) events across the different processors.
2. Description of the Related Art
To order events accurately that occur on a distributed processing system, the difference between TODs read on two different processors simultaneously must be less than the smallest latency for a message between the two processors, and must be continuously increasing. Today this is done by either having a single central TOD repository or by running the entire system off a single clock propagated to all the processors (nodes). The central repository has the disadvantage of being a bottleneck to processing because all nodes need to query this single node, and as a consequence, all these queries are serialized, reducing the speed up that is intended to be achieved by parallelizing the application. Using a single system oscillator that is propagated through the system has the disadvantage of requiring specific links between nodes to be operational, or the system fails. Since on a large distributed system these links are required to be cables that can be accidently pulled, a real concern exists.
The present invention overcomes the disadvantages of the known art by a hardware and software implementation that allows all the processing nodes to increment their TOD off a local oscillator and keeps them within the minimum latency by broadcasting periodic updates. In essence, the invention provides a TOD incrementor throughout a distributed system without requiring a clock to be distributed, and with a maximum difference less than the minimum latency, as opposed to prior art solutions that either had a larger maximum difference or distribute a clock to run the TOD synchronously. Through the use of the present invention, the reading of the TOD to all of the processing nodes is distributed thereby removing the need for a single centralized TOD source. The present invention also removes the need for specific cables being operational (as in a single system oscillator design). As long as there is a path from a xe2x80x9cTOD Master Chipxe2x80x9d to a node, the TOD is kept within tolerance.
The present invention utilizes a switch design that is implemented in hardware and software to accurately update and distribute a received TOD to all its neighbors. The method and apparatus of the present invention includes the capability of calculating the delay of the cables connected to the switch designated as the xe2x80x9cTOD Master Chipxe2x80x9d, and the capability to receive a special TOD broadcast packet and update it with the calculated cable delay. The invention further includes the capability to adjust the local TOD to the received broadcast value without decrementing the value, and the capability to send the TOD broadcast packet to all it neighbors. The last-mentioned capability serves to detect whether the neighbor has already received a specific update, in which case, the received packet is discarded (ending the feedback loop).
In the method and apparatus of the present invention, the TOD adjustments can be simple loads of the received broadcast when the newly received value is greater than the local copy. However, if the local copy is greater than the received broadcast, the local copy must be held or incremented at a slower rate until the difference is removed (in the design according to the invention, the local value is incremented at a fractional value, e.g. half speed, until it matches the received value which is being incremented at normal speed).
From a system perspective, the invention is implemented with the capability to select a single switch or node to be the xe2x80x9cTOD Master Chipxe2x80x9d. This xe2x80x9cTOD Master Chipxe2x80x9d periodically initiates a TOD broadcast by sending TOD broadcast packets to all of its neighbors. The TOD broadcast packet contains the current TOD value and a sequence number (used to detect feedback loops). The TOD broadcast packet may also include a flag indicating that it is a TOD broadcast packet, rather than a normal data packet, and may also include a checksum to detect errors in the packet. For system reliability, a backup that monitors the xe2x80x9cTOD Master Chipxe2x80x9d and takes over its function when that chip fails may be provided.