“Clustering” generally refers to a computer system organization where multiple computers, or nodes, are networked together to cooperatively perform computer tasks. An important aspect of a computer cluster is that all of the nodes in the cluster present a single system image—that is, from the perspective of a user, the nodes in a cluster appear collectively as a single computer, or entity.
Clustering is often used in relatively large multi-user computer systems where high performance and reliability are of concern. For example, clustering may be used to provide redundancy, or fault tolerance, so that, should any node in a cluster fail, the operations previously performed by that node will be handled by other nodes in the cluster. Clustering is also used to increase overall performance, since multiple nodes can often handle a larger number of tasks in parallel than a single computer otherwise could. Often, load balancing can also be used to ensure that tasks are distributed fairly among nodes to prevent individual nodes from becoming overloaded and therefore maximize overall system performance. One specific application of clustering, for example, is in providing multi-user access to a shared resource such as a database or a storage device, since multiple nodes can handle a comparatively large number of user access requests, and since the shared resource is typically still available to users even upon the failure of any given node in the cluster.
As with most computer systems, clustered computer systems are often configurable so as to maximize performance within a particular application. Moreover, since communication between nodes in a clustered computer system is often a critical path in controlling system performance, many clustered computer systems have a number of configurable low-level communication parameters that control how each node operates in the system.
As an example, many clustered computer systems implement controlled fragmentation of cluster messages. Fragmentation is a process whereby large messages are broken up into multiple, smaller packets, prior to being sent across a network. The packets may be permitted to arrive at a destination in different orders, and through the utilization of identifiers in the packets, received packets may be automatically reassembled in their original order to reconstruct the original message.
From the standpoint of networking hardware, fragmentation facilitates packet transmissions since the hardware is able to work with relatively smaller batches of data at a time. Furthermore, when combined with the reliability functionality inherent in some networking protocols such as the Transmission Control Protocol (TCP), fragmentation can reduce the amount of network traffic since packets may often be selectively resent instead of requiring an entire message to be resent in the event of a failure to deliver any portion of a message.
Some clustered computer systems implement fragmentation directly within the cluster messaging services that control the transmission of cluster messages between cluster nodes, typically for the purpose of implementing reliability functionality over an underlying networking protocol such as User Datagram Protocol (UDP) that advantageously supports multicasting of messages to multiple receivers but does not natively support the same degree of reliability as other protocols such as TCP. Such systems, however, are still often built on top of a lower level networking protocol such as the Internet Protocol (IP) that natively supports fragmentation for the purpose of preventing buffer overruns in networking hardware.
Fragmentation algorithms typically rely on a fragmentation size parameter (also known as a maximum transmission unit (MTU)) that sets the maximum packet size, and thus, the places within a message along which the message is fragmented. To prevent fragmentation from occurring at the lower network layer, therefore, the fragmentation size parameter utilized in cluster messaging services must be less than or equal to the MTU for the underlying networking protocol. Furthermore, since the networking protocol MTU is hardware dependent (typically based upon the sizes of the internal buffers in the hardware), the cluster messaging service fragmentation size parameter must be set based upon the smallest networking protocol MTU for any networking hardware along the communications path between cluster nodes.
To minimize the overhead of packet headers, and thus maximize system performance, it is desirable to utilize as large a fragmentation size as possible. Given, however, that hardware devices may vary in different clustering environments, and that hardware devices may be replaced or upgraded over time with higher performance devices, a strong need exists for a manner of setting fragmentation size in a clustered computer system to a maximum allowable value for any particular clustered computer system.
Conventional clustered computer systems permit low-level communication parameters such as fragmentation sizes to be individually set on different nodes. However, such settings are typically made via configuration files that are read during startup, and thus, these parameters are often not capable of being modified without requiring a node, or an entire clustered computer system, to be taken off line and restarted.
Given the desirability of maximizing availability in clustered computer systems, it would be extremely beneficial to permit such parameters to be modified dynamically, without requiring a node or system to be taken off line. Conventional clustered computer systems, however, lack any such functionality, and thus any modifications made to such systems require at least some interruption of availability.
Moreover, it has been found that many cluster communication parameters utilized by cluster nodes are not capable of simply being modified locally on a node without some degree of coordination with other nodes. As an example, modifying the fragmentation size on a sending or source node requires coordination with any receiver or target nodes so that such target nodes process any received messages using the correct fragmentation size.
Therefore, a significant need exists in the art for a manner of reliably modifying cluster communication parameters in a clustered computer system with reduced effect on system availability.