A distributed database management system (“distributed DBMS”) may maintain a collection of items stored on multiple computing nodes. Each item may be uniquely identified by a primary key. The primary key may be composed of two portions, a leading portion sometimes referred to as a hash key, and a trailing portion sometimes known as a range key. The leading portion, or hash key, may be used to locate a computing node on which an item is stored. The range key may be used to perform queries over a range of items stored on the computing node indicated by the hash key. A query of this type may apply to all items having the same hash key value. The applicable set of items may also be limited by applying a filter to the items' range key values.
The distributed DBMS may use various schemes to randomize the placement of items across multiple computing nodes, while still allowing the node on which an item is stored to be located using a hash key. Random distribution of items may improve the performance of read and write operations because the workload related to processing the read and write operations would tend to be evenly distributed across the multiple computing nodes. Range queries made over a range key may remain efficient because all items with a particular hash key value are located on the same computing node.
Range queries might also be performed over hash key values. However, if items are randomly distributed between computing nodes, a range query over hash key values may be inefficient because items having similar hash key values might be widely distributed. Non-random distribution of the items may improve the performance of range queries, but may lead to hotspots in which workload is overly concentrated on particular computing nodes.