It is often desirable to implement large databases using many small computers rather than a single server with numerous processors and large memory banks. These small computers are often described as commodity hardware because, in contrast with the most powerful servers, they are relatively inexpensive, easily obtained, and readily interchangeable. However, despite the advantages of using commodity hardware, creating large databases that are scalable and efficient remains a challenging endeavor.
One technique for implementing large database systems on commodity hardware involves utilizing shards. In general terms, a shard is a computing system responsible for solving a small portion of a larger computing problem. For example, when shards are used in large-scale data management systems, tables may be divided by row into what are known as horizontal partitions. Each shard manages a partition, and responds to requests and commands involving that partition. This approach can reduce costs and have better performance when compared to traditional database systems.
However, shard-based systems can be difficult to optimize. Each shard may include an independent data management system that has a query optimizer capable of generating an execution plan for a query that runs on that shard. While the execution plan may be locally optimized for execution on that shard, it is not necessarily optimized across all shards. Because of differences in performance characteristics and the data managed by each shard, a plan that runs efficiently on one shard may run poorly on another.