1. Technical Field
The present application relates generally to an improved data processing system and method. More specifically, the present application is directed to a method for providing a cluster-wide system clock in a multi-tiered full-graph interconnect architecture for data processing.
2. Description of Related Art
Ongoing advances in distributed multi-processor computer systems have continued to drive improvements in the various technologies used to interconnect processors, as well as their peripheral components. As the speed of processors has increased, the underlying interconnect, intervening logic, and the overhead associated with transferring data to and from the processors have all become increasingly significant factors impacting performance. Performance improvements have been achieved through the use of faster networking technologies (e.g., Gigabit Ethernet), network switch fabrics (e.g., Infiniband, and RapidIO®), TCP offload engines, and zero-copy data transfer techniques (e.g., remote direct memory access). Efforts have also been increasingly focused on improving the speed of host-to-host communications within multi-host systems. Such improvements have been achieved in part through the use of high-speed network and network switch fabric technologies.
One type of multi-processor computer system known in the art is referred to as a “cluster” of data processing systems, or “computing cluster.” A computing cluster is a group of tightly coupled computers that work together closely so that in many respects they can be viewed as though they are a single computer. The components of a cluster are commonly, but not always, connected to each other through fast local area networks. Clusters are usually deployed to improve performance and/or availability over that provided by a single computer, while typically being much more cost-effective than single computers of comparable speed or availability.
While computing clusters provide a way for separate computing devices to work in concert with each other, each of the computing devices itself still operates somewhat independently of the other computing devices in the computing cluster and relies upon communication between the computing devices to provide the mechanism for collaborative computing. For example, each computing device still operates using its own internal system clock signal such that the system clock signals of the computing devices in the computing cluster are not synchronized. As a result, operations which may require or benefit from synchronization of tasks being performed on the various computing devices must employ complex synchronization mechanisms, typically provided in software outside the circuitry of the computing devices. Such synchronization mechanisms tend to cause latencies and wasted processing cycles in the computing devices as synchronization operations are performed or processors wait for other processors to become synchronized.