1. Technical Field
One or more embodiments relate generally to managing databases. More specifically, one or more embodiments relate to systems and methods of managing a database distributed across a plurality of clusters.
2. Background and Relevant Art
Conventional databases often use a cluster of physical or virtual servers to store data and support operations. In order to accommodate for an increased need of resources or capacity, databases typically allow additional servers to be added to the cluster. Once the size of the cluster is increased, some conventional databases (such as NoSQL databases) can allow the data to be spread across the servers in the larger cluster. Along related lines, to accommodate decreased needs, some conventional databases can allow for the removal of servers from the cluster. Once the size of the cluster is decreased, such databases may allow the data to be re-distributed across the servers of the smaller cluster.
While conventional databases provide many advantages, they nonetheless have several drawbacks. For example, conventional NoSQL databases often have a limitation on the number of servers that can be included in a cluster. As such, in order to accommodate larger amounts of data, including larger datasets, multiple clusters may be needed.
Unfortunately, managing a dataset across multiple clusters of a conventional NoSQL database presents various problems. In particular, conventional NoSQL databases often lack the ability to evenly distribute data across a plurality of clusters. The inability to evenly distribute data can lead to overloading of some clusters and simultaneous under utilization of other clusters. Overloading of a cluster can decrease database responsiveness and result in cluster down time. On the other hand, under utilization of a cluster can waste valuable resources.
Conventional solutions to including more than one cluster in a database typically involve a client-side application sharding the data between clusters. In other words, an application relying upon multiple clusters typically is required to recognize the different clusters and know which data to send to, and request from, which cluster. As a result, an application relying on multiple clusters often requires additional code in order to interact with multiple clusters. The additional code requires additional effort, time, and cost to debug and maintain.
The required complexity and the increased potential for problems related to the use of multiple clusters are exacerbated when there is a need or desire to add or remove clusters. In particular, a developer/administrator typically would need to modify the application to direct data to new clusters or away from removed clusters. Such rebalancing traditionally has been time consuming and/or inefficient. As the complexity and size of the cluster(s) increase, the burden on the developer/administrator increases accordingly, thereby increasing the time, complexity of analysis, and/or potential risk of errors. Errors in the rebalance process can result in data being unavailable for extended periods of time. In addition, errors in the data rebalance process can eventually result in breaking data consistency within the database.
Perhaps due to the complexities of relying upon multiple clusters, some conventional single-cluster databases allow for large numbers of servers. Large single-cluster databases, however, also have several drawbacks. For example, in conventional single-cluster database systems, if the cluster goes down, all data in the system may be inaccessible during the downtime. Losing the ability to access data can lead to the loss of consumer confidence in an application relaying upon the database system. With increasing competition and high reliability expectations, such database system down time can cause a permanent loss of customers.
Additionally, it is not uncommon for conventional database systems to have a least-common-denominator hardware utilization scheme. In other words, a single-cluster database system may only operate at the equivalent of the capabilities of the lowest performing hardware in the cluster. Thus, before gaining benefits from new hardware with higher performance capabilities, all of the hardware in the cluster may need to be replaced.
These and other disadvantages may exist with respect to conventional databases and management of conventional databases.