Fundamentally, computer systems are used for the storage, manipulation, and analysis of data. One mechanism for managing data is called a database management system, which may also be called a database system or simply a database. The most common is usually called a relational database (RDB), which organizes data in tables that have rows, which represent individual entries or records in the database, and columns, which define what is stored in each row, entry, or record. Each table has a unique name within the database and each column has a unique name within the particular table. The database also has an index, which is a data structure that informs the database management system of the location of a certain row in a table given an indexed column value, analogous to a book index informing the reader on which page a given word appears.
Data in databases is often divided or distributed across multiple partitions, in which a database table is stored using more than one physical data space, but the table appears as one object for data manipulation operations, such as queries, inserts, updates, and deletes. Partitioning has two fundamental types: horizontal and vertical. Horizontal partitioning allows tables to be partitioned into disjoint sets of rows, which are physically stored and accessed separately in different data spaces. In contrast, vertical partitioning allows a table to be partitioned into disjoint sets of columns, which are physically stored and accessed separately in different data spaces. Partitioning of data can significantly improve performance of the requests that access the data, but partitioning also has the potential to decrease performance if done improperly.
Database administrators often partition data so that it is evenly distributed across multiple partitions, in order to increase performance of requests that access the partitions, so that no one partition is a bottleneck for the requests. Unfortunately, after months or years of operations against the data (e.g., updates, insertions, and deletions), the partitions may become more and more unevenly distributed, which results in an uneven distribution of requests to the partitions. Consequent decreased performance occurs as the partitions with the most data receive the most requests and hence become performance bottlenecks.
To correct an uneven distribution of partitioned data, administrators often redistribute the data by moving data between existing partitions or by creating new partitions and copying data from the existing partitions to the new partitions. Current techniques must shut down the database or block requests to the database while redistributing the data. Because of the large amount of data that is often involved, this redistribution may take hours, days, or even weeks, during which time the data is unavailable. Such an extended period of data unavailability is burdensome or unacceptable for many users.
Hence, an enhanced technique for redistributing data across partitions is needed.