Searching for keywords or similar data items within a search domain made up of a number of documents typically involves the use of an index. Often, this is an inverted index which associates keywords with documents.
Where the search index is general purpose in nature, it must support a variety of types of searches. One common example is a keyword search where the user supplies one or more keywords, or values, and the search result is all documents within the search domain which contain all of the keywords. Another example is a phrase search where the user supplies a phrase made up of two or more words in a specified order. The search result in this case is all documents from the search domain which contain the phrase exactly as supplied (i.e., all words adjacent and in the same order). An index which supports phrase queries must contain significantly more data than one which does not because it must include the position within the document of every occurrence of the word.
In order to meet the user's needs, searching must be both fast and accurate. At the index level this levies competing requirements. The index must be complete in order to be accurate, but this drives a need for a larger index. The index must be small in order to be accessed quickly, but this drives a need to eliminate data. Compression schemes can be used to reduce the amount of data which must be read in, but this may not be sufficient to meet the user's need for quick results.