Commercial search engines oftentimes have over a hundred billion documents indexed, which can then map to petabytes of data. Search engines rely upon systems comprising large numbers of machines grouped in clusters by functionality, such as index servers, documents servers, and caches.
In a web search engine, the query results in the cache expire after an index update. A common solution for this problem is to re-compute every query after the index update. However, this is a waste of time since most of the webpages do not change and some percentage of queries is repeated. Current search engines also suffer from the heavy daily index merge, where the whole index will be updated.