It is known to provide a number of computing devices that can communicate with each other. The computing devices can be managed such that they appear to a user to behave as a single computer system. Such an arrangement is called a computing cluster, and the individual computing devices are known as nodes in the cluster. A computing cluster can, for example, provide cost savings when compared to a single supercomputer of comparable processing power.
FIG. 1 shows a simple computing cluster, comprising a plurality of nodes 102 to 108. Each node 102 to 108 is connected to one or more other nodes by an interconnect 110. The interconnects 110 may be a standard communication link such as, for example, Ethernet. Alternatively, the interconnects 110 may be specialist hardware such as Memory Channel, available from Hewlett-Packard company. Each node 102 to 108 includes appropriate hardware for communicating with the rest of the cluster via interconnects 110 such as, for example, Ethernet hardware. Data sent across at least one cluster interconnect 110 is known as cluster interconnect traffic. Traffic to be sent by a node 102 to 108 is first held in a first-in first-out (FIFO) queue (an interconnect queue) before being transmitted. Packets arriving at a node are also held in an incoming FIFO queue before being processed. If a process on a node creates traffic to be transmitted, but the interconnect queue is full and can hold no more traffic, then some or all of the traffic is dropped. As a result, some or all of the traffic is never transmitted.
Each node 102 to 108 also includes node management software which enables the computing device to function as a node in the cluster. Examples of existing node management software include TruCluster and OpenSSI.
Nodes of a computing cluster work together to achieve a common purpose. The following are the main types of computing cluster:    1. High performance (HP) clusters, which are constructed to run parallel programs (for example weather simulations and data mining) to accomplish a task given to the computing cluster. While a master node typically drives the cluster nodes, the nodes work together and communicate to accomplish the task.    2. Load-levelling clusters, which are constructed to allow a user on one node to spread his workload transparently to other nodes in the cluster. This can be very useful for computationally intensive, long running jobs that aren't massively parallel.    3. Web-service clusters, which implement a form of load levelling. Incoming web service requests are load-levelled between a set of standard servers. This could be regarded as a farm rather than a cluster since the server nodes don't typically work together.    4. Storage clusters, which consist of nodes that supply parallel, coherent and highly available access to file system data.    5. Database clusters, which consist of nodes that supply parallel, coherent and highly available access to a database. The database cluster is the most visible form of application specific clusters.    6. High Availability clusters, which are also often known as failover clusters. Resources, most importantly applications and nodes, are monitored and scripts are run when failure of a node is detected. Scripts are used to transfer applications and services running on the failed node onto another node.
Traffic often takes the form of a number of small data packets. The latency of a data packet is the time between creating the packet to be transmitted on one node and receipt of the packet at a destination node. Latency of a particular data packet can be affected by, for example, the speed of the interconnect 110 and the amount of traffic to be sent on the interconnect 110. If for example a burst of data packets is produced in a node at a rate faster than the speed of the interconnect 110, some data packets will be held in an interconnect queue while the data packets are transmitted across the interconnect 110 at a slower rate. The latency of the packets held in the queue will therefore increase when compared to packets which are transmitted instantly, for example if the interconnect queue is empty and there is little other traffic on the interconnect 110.
The computing cluster may include shared resources. An example is a shared mass storage device. Any of the nodes 102 to 108 in the cluster may access the device (for example to read or write data). However, only a single node may access the device at any one time. A node accesses the device by first locking the device such that no other node may access the device. The node then accesses the device, and then releases the device so that it is no longer locked and other nodes may attempt to lock and access the device. A distributed lock manager (DLM), which is a kernel subsystem, has the responsibility of managing resources in this way. Other examples of shared resources in a cluster include shared processors and shared memory.
Heavy traffic on the cluster interconnects could adversely affect the performance of the cluster. For example, heavy traffic could adversely affect the communication between crucial kernel subsystems or lead to poor utilisation of cluster resources.
For example, a shared resource locked and then released by a node will continue to effectively remain locked until other nodes are informed of the release. The information that the resource has been released might be delayed (and the latency of the data packets containing that information might increase) if traffic rates on the cluster interconnects are high. This could lead to wasted processor cycles on any node waiting for the resource to be released, and also could lead to poor utilisation of the shared resource as the resource is not used again until the waiting node is informed that the resource has been released.
Furthermore, operation of the cluster depends on many kernel subsystems communicating across the cluster. These subsystems periodically exchange messages. For example, the cluster management software TruCluster has a connection manager kernel subsystem. This subsystem generates health, status and synchronisation messages. The connection manager subsystem uses these messages to monitor the state of the cluster and its members and to coordinate membership and application placement within the cluster. If a cluster node generates health messages which are subject to high latency, then the node may be treated as non-responsive and may be restarted. Since the cluster node will then be non-operational, at least for a short time, this will lead to degradation of cluster performance.
A solution to these problems provided by TruCluster is to provide a high cost, low latency dedicated hardware interconnect structure such as Memory Channel. This is not always available to a cluster designer and is undesirable due to the high associated cost. When excessive traffic is produced by a cluster node and the interconnect queue is full, packets are dropped.
OpenSSI defines two channels, one used exclusively by crucial kernel services, and the other for all other types of traffic. Each channel is associated with a FIFO interconnect queue. There are therefore two interconnect FIFO queues. There is no queue throttling (dropping of traffic) on the channel used by crucial kernel services, as there is no size limit for the associated FIFO queue. The other channel is throttled, i.e. traffic will be dropped when interconnect traffic is high and the associated interconnect queue is full. This solution does not guarantee low latency for traffic in the channel used by crucial kernel services, as when there is a large amount of traffic in the queue the latency will be high.
It is an object of embodiments of the invention at least to mitigate at least some of the problems of the prior art.