Search engines employ a variety of techniques to perform search queries. Although search capabilities have become increasingly important and some natural language based search techniques have been developed, search has essentially remained constrained by small query limits.
Currently, a fundamental technique to find similar documents according to a given query, e.g. document, is to select a minimal representation of keywords or phrases, e.g. 2-10, and use the minimal representation as query input. By using such a minimal representation, related entries are found for each query term, often in an inverted list. Inverted list is a popular data structure to build an efficient index for large-scale text corpora. In an inverted list, words are primary keys, and documents containing the same word are organized as a row of the list. By using an inverted list the search engine achieves efficient response times for queries made up of a few terms.
However, existing query indexing techniques do not address the long-query problem due to the special properties of such a query, e.g. hundreds of terms, sparseness and high-dimensionality. Although short query retrieval techniques have been developed, e.g. 2-10 query terms. A long-query, e.g. 100, 1500, 2000 etc. query terms, presents a different problem from that of a short query.