Many traditional information retrieval systems operate according to a “receive query/execute query/return response” paradigm. With this paradigm, a user first submits a request for information, known as a query, to a query engine of the system. Upon receiving the query, the query engine executes the query against a body of data (i.e., the “data corpus”) and generates a result. Finally, the query engine returns the generated result to the user, thereby fulfilling the user's information request.
While the foregoing paradigm works well in many scenarios, it can be problematic in certain cases where query response time (i.e., the latency between submitting a query and receiving a result) is important. For example, consider an environment where a user is interacting with an information retrieval system in real-time (via, e.g., a website or some other client-side interface). Due to the interactive nature of the environment, the user may expect to receive responses to submitted queries relatively quickly. However, because the “receive query/execute query/return response” paradigm requires each query to be executed in full upon query submission, if the execution time for a particular query is excessively long (due to, e.g., system load, the size of the data corpus being searched, and/or high query complexity), the user will have to wait a correspondingly long time before a result that is responsive to the query is returned. This, in turn, can adversely impact the usability/user-friendliness of the system.
One approach for addressing the problem above is to cache the result for each query as it is generated. With this approach, when a user submits a previously executed query, the result can be retrieved directly from the cache (without re-executing the query). Unfortunately, this approach works poorly in situations where the data corpus is dynamic in nature (e.g., is modified and/or grows in size on a frequent basis). In these situations, conventional caching will generally be ineffective because the cached query results will become invalid quickly (e.g., on any subsequent data write operation), thus requiring subsequent instances of the same query to be re-executed in full on the most recent data.