A distributed database management system (“distributed database”) may be provided by a hosting company to store and retrieve data maintained on behalf of the hosting provider's customers. Large quantities of data may be involved. Furthermore, each customer may add additional data to the system on an ongoing basis, following unpredictable patterns that may vary for each customer and change over time.
To handle large volumes of data, the distributed database may divide tables of data into logical collections of data sometimes known as partitions, which may each be hosted on different computing nodes in order to distribute the consumption of storage and workload capacity related to the table. Each partition may, in turn, contain various collections of logically related data. Partitions and the collections of data they contain may exhibit different characteristics regarding their usage patterns and growth characteristics. Some partitions and collections of data may grow rapidly, while others may grow slowly or remain more or less the same size.
To evenly distribute consumption of storage and workload capacity, a distributed database may use various techniques to determine which computing node, among the computing nodes that comprise the distributed database, should receive a new partition or collection of data. Random distribution or other similar techniques may be employed so that data may be evenly distributed. However, in some cases the resulting distribution of data may be skewed regarding the growth rates of data housed on each computing node. Some computing nodes may experience rapid growth that the node does not have the capacity to accommodate, while other computing nodes in the system experience little growth and remain under-utilized.