Today's large data centers manage collections of data comprising billions of data items. In such large collections, searching for particular items that meet conditions of a given search query is a task that consumes a significant amount of computing resources. It also takes a noticeable amount of time, even on the most powerful multiprocessor computer systems. In many applications, search query response time is critical, either because of specific technical requirements, or because of high expectation from human users. Various conventional methods are used to reduce search query execution time.
Typically, in building a search-efficient data collection management system, data items are indexed according to some or all of the possible search terms that may be contained in search queries. An “inverted index” of the data collection is created (and maintained and updated) by the system for use in the execution of search queries. An inverted index comprises a number of “posting lists”. Each posting list corresponds to a search term and contains references to the data items that include that search term (or otherwise satisfy some other condition that is expressed by the search term). For example, if data items are text documents, as is often the case for Internet search engines, then search terms are individual words (and/or some of their most often used combinations), and the inverted indexes have one posting list for every word that has been encountered in at least one of the documents. In another example, the data collection is a database comprising one or more very long tables. The data items are individual records (i.e. the lines in a table) having a number of attributes represented by some values in the appropriate columns of the table. The search terms are specific attribute values, or other conditions or attributes. The posting list for a search term is a list of references (indexes, ordinal numbers) to records that satisfy the search term.
To speed up execution of search queries, the inverted index is typically stored in a fast access memory device (e.g. RAM) of one or more computer systems, while the data items themselves are stored on larger but slower storage media (e.g. on magnetic or optical disks or other similar large capacity devices). In this way, the processing of a search query will involve searching through one or more posting lists of the inverted index in the fast access memory device rather than through the data items themselves (in the slower access storage device). This generally allows search queries to be performed at a much higher speed.
To speed up search query processing further, a very large data collection is typically divided into a number of partitions commonly termed “shards”, with each shard being hosted on a separate computer system (a “server”) and having its inverted index. The data collection management system comprises networked means for distributing queries to all (or some—as the case may be) of the shards, and for collecting and aggregating the partial search results obtained by the processing of those distributed queries on their respective shards.
In applications where many search queries are to be rapidly processed in parallel, a further enhancement is often applied. Through this further enhancement, all (or some) of the shards are replicated, so that each shard exists within the data collection management system multiple copies. As an example, the data collection or the inverted index may be split into N shards, with each shard being replicated in M number of copies, called “replicas”. Each individual search query is then replicated and distributed to the N shards for separate execution on each shard. At the shard level, the query is assigned for execution to one of the M replicas of each shard N. For example, a collection may be broken down into two shards [N=2] with each shard having three replicas [M=3]. Thus, there will be
(a) shard 1, replica 1 [Sh1-1];
(b) shard 1, replica 2 [Sh1-2];
(c) shard 1, replica 3 [Sh1-3];
(d) shard 2, replica 1 [Sh2-1];
(e) shard 2, replica 2 [Sh2-2];
(f) shard 2, replica 3[Sh2-3];
and as an example, the query may be executed on shard 1, replica 3 [Sh1-3] and on shard 2, replica 2 [Sh2-2]; the query typically being executed on (a replica of) every shard. The results of the search on each shard would then be aggregated to yield a final search result.
Yet an additional level of parallelism can be achieved by further dividing the data collection into smaller shards, such that one server may host more than one of these smaller shards. In this way, a further parallelization for each individual search query can be achieved, by using, on a given server, a separate execution thread for every distributed query addressing one of such smaller shards on that server. In this way, if, for example, the entire data collection is divided into 2000 such “virtual shards” that are distributed among 1000 servers with two shards per server, then the processing of search queries will be carried out by 2000 parallel threads on 1000 servers, rather than by 1000 threads only.
However, such a static partitioning of a data collection into a greater number of shards may result in an overall loss in performance. This is because the execution time of a search query does not decrease inversely proportionally to the increase in the number of shards, but rather it decreases at a much slower rate. This can be explained by the fact that partial results obtained by processing individual distributed queries must then be aggregated, both on the same server (if that server hosts several virtual shards), and then over an inter-server network for shards on different servers. Such an aggregation task becomes more and more complex and more resource-consuming as the number of shards increases. Also, pruning (i.e. the early termination of a search according to some predefined criterion—such as the number of search results obtained) works more efficiently on longer shards.
Hence, in the above example with 1000 servers if the number of shards is increased from 1000 shards to 2000 shards, the average execution time of an individual search query may decrease to, for example, ⅔ of the original time that the execution took with 1000 shards, rather than to the expected ½ of the time. However each search query would now take up not one but two of the available threads on every server (for example, K threads), so the maximum total number of queries that may be executed in parallel will be halved. The total performance of the system when fully loaded will therefore decrease from K queries per one time unit to (½)(3/2)(K)=¾ K queries per time unit. Thus, when the system receives queries at an average rate greater than ¾ K queries per unit time, the excess queries will wait for their execution in an input queue. This will increase the total response time of the system, which is actually the opposite of what one was attempting to achieve in the first place. Therefore known methods of increasing the number of shards with a view to uniformly decreasing the execution time of search queries works sufficiently well up to a certain system load, and then starts to introduce an opposite slow-down effect.
The individual servers that host each shard, replica of a shard (in a multi-replica system), virtual shard, or replica of a virtual shard, are typically multiprocessor systems, with each processor having more than one processing core, and with each processing core being multithreaded. Thus each server is provided with the capacity of simultaneous multithreading. These additional computing capabilities make it possible to simultaneously execute on one single physical server a number of parallel execution threads performing the same search query on different shards located on the server, different search queries on the same shard located on the server, and/or different search queries on different shards located on the server. While this is another enhancement to the search system, what is not currently conventionally possible is to have different threads execute the same search query on the same shard on the same server.
Aside from the number of resources that are available in a given system to execute search queries, there is an additional consideration with respect to executing search queries that must also be considered. This additional consideration concerns the fact that search queries do not have a uniform complexity. Some search queries are much more complex than others, leading to very different search execution times. For example, a search query containing two search terms that occur relatively frequently generally (e.g. two common English words), but that rarely occur together in the same document, would typically take much longer to execute than a search query containing two search terms that are relatively infrequently searched, but that are related somehow and often appear together in the same document.
Thus, while current conventional computer systems are adequate for the handling of simultaneous execution of multiple searches, improvement over such systems is nonetheless possible.