The present invention specifically relates to search applications in enterprise search systems. An enterprise search system is generally built upon search applications implemented by a search engine, as well-known to persons skilled in the art. The search engine itself are located on one or more servers connected and operated in a data communication network which in enterprise search systems usually can be regarded as any local network or an intranet. The search engine comprises appropriate interfaces to clients or users of the enterprise search system as well as to content or document repositories belonging to an enterprise, but possibly also to document repositories located outside the enterprise and belonging to external information providers, but accessible from the local data communication network. The search engine comprises a subsystem for indexing documents and content in repositories belonging to the enterprise or residing in external repositories. Indexes created by the indexing subsystem of the search engine are usually stored on the servers used by the enterprise search system. Further subsystems of the search engine handle user search queries, retrieve and analyse documents matching a search query, and present result sets of the matching documents to the user.
Current search systems tend to fall into one of two categories with regard to index types:    Live or active index, wherein all information can be accessed very quickly, but where there is typically a significant overhead in keeping the search system ready to respond    Disk-based index, wherein information to be indexed is written sequentially to one or usually several files. These systems are characterized by much lower idling overhead, but also much higher latency and overhead when starting a search operation. Disk-based indexes are often denoted dormant or non-active, as opposed to live indexes, which can be accessed much faster.
However, both types have certain shortcomings with regard to search efficiency. In live indexes, as described above, the number of documents that can be stored on one server is usually limited by the memory available on the server. Further, the access structure of the index itself is usually based on inverted files of the content, such that only a limited number of disk or memory accesses are needed to identify records that should belong to a search result. This principle tends to break down when indexing very large corpora, since individual search terms are likely to appear across a large number of documents.