1. Field Art
The disclosure generally relates to the field of data storage, and in particular to, partitioning of the data storage.
2. Background Information
As computers, smart phones, tablets, laptops, servers, and other electronic devices increase in performance year to year, the amount of data they generate also increases. Many conventional relational database systems such as MYSQL or SQL SERVER fail to scale economically and efficiently. Particularly, the relational database system suffers from long search latency and reliability issues, when the amount of data grows into the range of hundreds of Terabytes or even Petabytes.
A distributed database system (herein also referred to as “NoSQL database system”) offers a performance advantage over the conventional relational database system by distributing contents over multiple storage machines. In one example, data is stored in a form of key-value pairs in the NoSQL database system, where the values are binary tree indexed by the keys for faster data retrieval than in the relational database system. Moreover, scalability can be improved by partitioning data and storage reliability can be improved by storing multiple copies of the same partitions at different storage machines. The amount of storage of the distributed database can be easily increased by adding a new storage machine to the existing machines and distributing a portion of one or more partitions of the content to the new storage machine.
In a conventional distributed database, when the existing storage machines serving the database are overloaded, new storage machines are added to increase the capacity. Adding the new storage machine involves load-balancing or redistributing of existing partitions. Specifically, load-balancing includes scanning the existing partitions, maintaining some keys on the existing storage machines and transferring the rest to the new storage machine. Transferring a key-value pair of a large partition (e.g., 10 Gigabytes) at a time for a partition is a time consuming process (e.g., 20 hours). Moreover, the new storage machine is not functional until all the key-value pairs of a new partition are transferred to the new storage machine. The distributed database cannot serve new requests to read, add, delete or modify data from the new machine, until the redistribution of the existing partitions is completed. Hence, the existing storage machines continue to be overloaded during the load-balancing. The existing storage machines are further overloaded due to the load-balancing activity.
Accordingly, the conventional database is inefficient in terms of latency involved for serving new requests received during the load-balancing.