In a typical information retrieval system, it is common for multiple clients (e.g., users, applications, etc.) to concurrently submit queries to the system for execution. If the information retrieval system does not have sufficient computing resources to run all of these queries simultaneously, the system will generally need to determine which queries to process immediately and which to put on hold for later execution. This task is known as query scheduling.
One way to perform query scheduling is to execute incoming queries in the order they arrive (referred to as a “first-come-first-serve” approach). However, a significant problem with this approach is that it cannot differentiate between queries that have differing response time requirements. For example, user queries (i.e., queries originating from human users via, e.g., an interactive UI) are typically time-sensitive and thus should be executed as soon as possible. On the other hand, automated queries (e.g., batch queries, precomputational queries, etc.) are generally less time-sensitive, and thus can tolerate some delay in response time. If queries are simply scheduled according to their order of arrival, some time-sensitive queries may be forced to wait behind time-insensitive queries, which can adversely affect the usability and responsiveness of the system.
Another way to perform query scheduling is to implement a fixed priority scheme. With this scheme, each query is assigned a priority based on some property known at the time of query arrival (e.g., the identity or type of the query requestor, etc.), and queries are scheduled according to their assigned priorities. This can avoid the problem noted above with the first-come-first-serve approach, since time-sensitive queries (e.g., user queries) can be prioritized over less time-sensitive queries (e.g., automated queries). Unfortunately, fixed priority scheduling also suffers from certain drawbacks. For instance, while fixed priority scheduling can be used to classify and prioritize queries based on non-runtime properties, it generally cannot distinguish “heavy” queries (i.e., queries that take a relatively long time to execute, such as on the order of minutes) from “light” queries (i.e., queries that take a relatively short time to execute, such as on the order of milliseconds or seconds), since query execution time is only ascertainable once a query has started running. This means that heavy queries can potentially be scheduled before, and thus block, light queries within a given priority class, thereby increasing average query response time.