In large databases, data may be divided amongst storage devices connected to multiple nodes. Such databases, having storage devices connected to multiple nodes, may be referred to as distributed databases.
Distributed databases may provide for improved performance, since access requests can be handled by multiple machines. In addition, distributed databases may be expanded relatively easily and may provide for improved fault tolerance in the event of component failure. Moreover, distributed databases may be constructed using multiple relatively low-cost servers and storage devices, which may be less costly than a single centralized server with equivalent capacity.
Data may, for example, be distributed among nodes of a distributed database by hashing, ranging or round robin on particular fields or columns. A given distribution scheme may allow for efficient execution of some access requests, but may reduce performance for other access requests.
Typically, a distribution scheme may be designed by a database administrator (DBA) to suit an anticipated workload of the database. For example, a database administrator may attempt to distribute data in a manner likely to allow efficient execution of common access requests.
Unfortunately, the data stored in a database, as well as the workload experienced by the database, may change over time. For example, new data sets may be added to a database or new applications may be deployed for accessing the database. Such changes may render existing distribution schemes inefficient.