In a computerized content delivery network, a content server typically selects a content item to display in conjunction with an electronic resource when the resource is viewed by a user. For example, the content item may be an advertisement and the electronic resource may be a webpage. The content server can use a variety of selection criteria to select a content item to display. For example, the content server may select a content item if the keywords associated with the content item match the subject matter of the electronic resource (e.g., same topic, same theme, etc.) and/or if the keywords associated with the content item match the established interests of the user viewing the resource.
A content provider (e.g., an advertiser) can interact with the content server via a management interface to add, remove, or change the keywords associated with content items that are managed by the content provider. The management interface can also be used to adjust other parameters affecting the distribution of the managed content items (e.g., ad group parameters, ad campaign parameters, bids parameters, etc.).
In a large content delivery network, it is not uncommon for a content provider to manage millions of content items having tens of millions or even hundreds of millions of keywords associated therewith. Due to the online nature of a content delivery network, it is often necessary to read through all of this data quickly to serve a request. The sheer scale of the data often mandates parallel processing, a prerequisite of which is a mechanism to split the large data set into smaller chunks.
One traditional technique for splitting a data set into chunks is known as static sharding. With static sharding, a database is divided into multiple independent chunks or shards according to a predefined and fixed distribution scheme. Static sharding does not handle different data distributions well and does not adapt to changes in the distribution of a data set. This can lead to unbalanced shards. For example, some shards may end up with significantly more data than other shards, thereby reducing the efficiency of processing the various shards in parallel. It is difficult and challenging to provide a fast and efficient mechanism for searching and/or processing data at a large scale without sacrificing adaptability.