The present invention relates generally to multi-processor computer systems. More specifically, the present invention provides techniques for building computer systems having a plurality of multi-processor clusters.
A relatively new approach to the design of multi-processor systems replaces broadcast communication among processors with a point-to-point data transfer mechanism in which the processors communicate similarly to network nodes in a tightly-coupled computing system. That is, the processors are interconnected via a plurality of communication links and requests are transferred among the processors over the links according to routing tables associated with each processor. The intent is to increase the amount of information transmitted within a multi-processor platform per unit time.
Previous implementations of such systems have had shortcomings. Some of these shortcomings relate to obtaining debugging information while the system is running. For example, prior implementations do not provide the ability to determine a configuration state of nodes in a cluster while the system is running. Instead, the system would need to be brought down in order to determine such configurations. Similarly, prior implementations have had a limited ability to respond to information determined during a debugging operation. For example, it would be desirable to fix problems such as deadlocks due to dropped packets, etc., without bringing the system down. It is therefore desirable to provide methods and devices by which multiple-cluster computing systems have improved troubleshooting and debugging functionality.