This disclosure relates generally to database clustering, and more specifically, to automated local database connection affinity and failover to a distributed connection when a local database fails in a database clustering environment.
The term “database cluster” may refer to two or more compute nodes (e.g., server computing devices). Each compute node includes or is directly associated with a storage device (or devices) that stores a database. The databases associated with the respective compute nodes may be identical, i.e., a single database is replicated on the storage device associated with each compute node. Alternatively, a single database may be partitioned and the databases associated with the respective compute nodes may each contain one of the partitions of the single database. Database clustering may be useful for providing continuous availability of data in case one or more compute nodes or associated databases fail (e.g., because of a failed processor, failed connection path, failed storage device, etc.). When a database fails, a failover operation may be executed such that any database operation that was undertaken but not completed by the failed database is assumed by a different database within another compute node.
In a shared-nothing architecture, each compute node may be responsible for a subset of data of a single database and each process associated with the subset. A particular transaction, for example, may be distributed among various compute nodes in parallel in order to execute the transaction. When a failure occurs at a particular node or the node's database, that node's designated process for a subset of data is transferred to another healthy node that stores the same subset of data. In a shared-everything (or shared disk) architecture, each compute node may have equal access to all of the data, as opposed to a subset of data. In these architectures, when a compute node's database fails, another compute node's database may efficiently take on the responsibilities of the failed database because each node may have shared access to all of the data, thereby enhancing fault or failure tolerance.