Search engines discover and store information about documents such as web pages, which they typically retrieve from the textual content of the documents. The documents are sometimes retrieved by a crawler or an automated browser, which may follow links in a document or on a website. Conventional crawlers typically analyze documents as flat text files examining words and their positions (e.g. titles, headings, or special fields). Data about analyzed documents may be stored in an index database for use in later queries. A query may include a single word or a combination of words.
A long query can often better express a user's information need than a short query. For example, the addition of qualifying phrases can help describe a user's target more precisely and express more complex relationships among terms. However, web search results for long queries are notoriously worse than those for short queries. Attempts to improve long query results may be classified into five categories: query reduction, query expansion, query reformulation, term and concept weighting, and query segmentation.