1. Technical Field
The present invention relates in general to a data processing system and in particular to a multi-processor data processing system. Still more particularly, the present invention relates to scalable shared memory multi-processor data processing systems.
2. Description of the Related Art
Symmetrical multiprocessing (SMP) and Non-Uniform Memory Access (NUMA) architecture are scalable data processing technologies that utilize multiple processors and shared memory to handle large applications or multiple applications at the same time. Scalable shared memory multi-processors are often built by interconnecting symmetric shared memory multi-processor systems with relatively small numbers of processors per system with an interconnect that maintains cache coherency. Doing so makes good use of other, pre-existing and often high volume products to create larger systems. The result is a cache-coherent non-uniform memory access multi-processor. (ccNUMA or simply NUMA). In addition, some architectures such as the PowerPC(trademark) (a product of International Business Machines of Armonk, N.Y.) provide individual processor time registers that increment at some divisor of the processor""s own frequencyxe2x80x94on the PowerPC the register is called the xe2x80x9ctime base register.xe2x80x9d The PowerPC architecture requires that the program-perceptible values of the time base on a multi-processor system increase monotonically, meaning that if a program reads the time base and then reads it again, the second value must be greater than or equal to the first value.
The values of the time base registers on multiple processors have to be close enough to each other that if a program runs first on one processor and then on another, the program reads a second time base value that is greater than or equal to the first one. The time to move a program from one processor to another is greater than 103 processor cycle times and the time base divisor is on the order of tens of cycles, which will force a multi-node NUMA system to synchronize the time base registers of all the processors in the system, to within approximately 102 time base ticks of each other. Time will be expressed in this disclosure in units of the time base cycles or values and the terms xe2x80x9ccyclexe2x80x9d and xe2x80x9ctickxe2x80x9d are used interchangeably.
There is often no common oscillator on a NUMA system and the time base registers drift apart from each other over time so the time base registers must also be re-synchronized with each other periodically. Although some hardware interconnection mechanisms do have a common oscillator that can be used for this purpose and others have a special packet format that carries a time value in its payload and ages this value as it is transmitted through the network, such hardware is not always present. Thus, some mechanism using standard hardware and appropriate logic is required. While time base synchronization is important, it must not be too expensive in terms of network load or specialized hardware. However, a time base synchronization mechanism to maintain the required level of synchronization is still needed.
It would be desirable, therefore, to provide a time base synchronization system for a multi-node NUMA multi-processor system. It is further desirable that the synchronization system be used with current interconnect implementations requiring no specialized hardware features. It would also be desirable to provide the synchronization system without imposing significant overhead on either the interconnect or the processors within the system.
It is therefore one object of the present invention to provide a time base synchronization system for a multi-node NUMA multi-processor system that will utilize available interconnect implementations without requiring specialized hardware features.
It is another object of the present invention to provide a time base synchronization system for a multi-node NUMA multi-processor system that does not impose significant operating overhead on the interconnect or processors in the system.
The foregoing objects are achieved as is now described. In a multi-node non-uniform memory access (NUMA) multi-processor system, a designated node synchronization processor on each node, is synchronized. Individual nodes accomplish internal synchronization of the other processors on each node utilizing well known techniques. Thus it is sufficient to synchronize one processor on each node. Node zero, a designated system node that acts as a synchronization manager, estimates the time it takes to transmit information in packet form to a particular, remote node in the system. As a result a time value is transmitted from the remote node to node zero. Node zero projects the current time on the remote node, based on the transmission time estimate and compares that with its own time and either updates its own clock to catch up with a leading remote node or sends a new time value to the other node, requiring the remote node to advance its time to catch up with that on node zero. Code on the remaining nodes is mostly passive, responding to packets coming from node zero and setting the time base value when requested. Monotonicity of the time bases is maintained by always advancing the earliest of the two time bases so as to catch up with the later one.
The above as well as additional objects, features, and advantages of the present invention will become apparent in the following detailed written description.