Two factors in the usefulness of a clustered database management system (DBMS) are scalability and availability characteristics. One of the key sub-systems involved in defining these characteristics is the communications architecture for the cluster, particularly communications between nodes comprising the cluster.
A typical clustered DBMS may employ a pair of dedicated processes for performing inter-node requests. A single communications socket stream is established between the processes for communications. Requests between the nodes are routed across communications node boundaries via dedicated daemons. Clustered DBMS configurations which utilize a single communications socket stream may become a bottleneck at high data volumes.
In a clustered DBMS configuration utilizing multiple communications socket streams, when a failure of one of the nodes in the cluster occurs, the timing of when the communications links detect the failure and the order in which the failure is detected is not defined. However, the first link failure indicates that the node is gone, and thus it is desirable to ensure that once the node has restarted requests from the failed node on any link are no longer processed, unless a response is sent to a node that has no knowledge of the original request in the first place. This requires not allowing any connections to be reestablished with the failed node until confirmation has been received that each receiver has purged their links and any requests received from the failed node. Furthermore, if there is any notion of session state associated with requests in progress, sufficient time must be allowed for the session state to be cleaned before communications can be reestablished or collisions between the new requests and the old state may result. The required processing to recover communications between nodes can be time consuming and result in slow recovery time for the clustered DBMS. From an availability standpoint, this recovery and failover configuration is slow at best.
Therefore, there is a continuing need for providing systems and methods for asynchronous interconnect protocols for clustered database management systems, which can facilitate, for example, improved cluster scalability and increased node recovery processing for high-availability configurations.