Data may be partitioned between multiple data sources, such as a “shard.” In such an architecture, the data to be stored in the shards is assigned an identifier, such as a customer's e-mail address or store number. A range of identifier values is then mapped to a specific shard. When data is created, it is placed in a corresponding shard based on its assigned identifier value. For example, data for the first 100 stores of a customer is to be stored in one shard, data for the second 100 stores of the customer is to be stored in another shard and so forth.
As the volume of data to be stored increases though, there will be a need to add additional shards. However, by adding additional shards, this presents a problem of how to relocate existing data from existing shards to the new ones. For instance, referring to the above example, instead of having data for stores 1-100 being stored in a single shard (e.g., shard #1), data for stores 1-50 may be stored in shard #1, whereas, data for stores 51-100 may be stored in a new shard (e.g., shard #2). As a result, existing data needs to be moved from one shard into another shard, such as moving data for stores 51-100 stored in shard #1 into shard #2. Although database replication software exists to move data from one shard into another shard, it requires significant computational resources to move the data, such as moving hundreds of millions of orders in an online transaction processing environment. Furthermore, the sharded data and the applications which access such sharded data may contain the old shard's database information, which now has to be updated for every single record that is moved.