Prior to the proliferation of electronically available information over the Internet, computerized retrieval of information could be approached in a relatively organized fashion. Documents having widespread interest were typically maintained only by commercial database providers, which categorized them (by subject, date, etc.), and perhaps abstracted them, thereby facilitating multiple modes of searching. Consequently, a database user effectively narrowed the search space at the outset merely by choosing the appropriate database, which would limit the searchable documents to the topic of interest. Then, the user could retrieve documents from the selected database based on any of a variety of search criteria other than simple "key words": date of publication, contents of a category-specific document field, title or author, to name but a few.
While commercial database providers still exist, increasing amounts of information are stored on servers accessible over the Internet, which frequently make them available free of charge. Information on the Internet, of course, is both vast and utterly disorganized in the sense of lacking any hierarchical or category-based indexing scheme. Particular kinds of documents may be found on large numbers of servers, so that arbitrarily confining one's search to a single such server is likely to miss numerous relevant documents located elsewhere.
To allow Internet users to focus their searching efforts, several firms have created free-of-charge sites called "search engines." These systems maintain huge and constantly growing databases duplicating the text (or portions thereof) of thousands or even millions of documents accessible over the Internet, and permit "visitors" to the site to formulate queries that the search engine applies to its database. The search engine retrieves documents matching the query, often ranked in order of relevance (e.g., in terms of the frequency and location of word matches or some other statistical measure).
Unfortunately, the sheer volume of documents and their lack of organization, combined with the limited searching capabilities of most search engines, make it very likely that relevant documents will be missed or elude notice amidst a plethora of irrelevant retrievals. In order to guide these simple types of searches, the proprietors of documents available over the Internet frequently provide them with "headers" which, while invisible to someone retrieving the document, are nonetheless acquired by search engines and form part of the searchable text of the document. A document may, for example, repeat a key word over and over in its invisible header, thereby ensuring that matches to queries containing the key word will receive a high relevance rank (since each repetition in the header counts as a separate match).
Nonetheless, key-word searching remains limited, frequently resulting in missed entries (due to synonymous ways of expressing the relevant concept) or, even more frequently, a flood of irrelevant entries (due to the multiple unrelated meanings that may be associated with words and phrases). For example, someone interested in military activities in China might attempt to search using the query "troops in China." But because of the numerous and varied topics that may implicate virtually any chosen set of words, the search engine might retrieve documents containing the following sentences:
1. Bill Clinton plans meeting with leaders of China to talk about US troops in Taiwan. PA1 2. Troops in Russia improve border security with China. PA1 3. Leader of NATO troops in Bosnia to visit China. PA1 4. Farmer finds crashed WWII troop carrier in southern China. PA1 5. CIA papers reveal US troops in Cambodia near border of China during Vietnam War. PA1 6. Asia expert, Johnson, talks to leaders of US troops about new weapons factories in China. PA1 7. British troops in Hong Kong have mixed reaction to handover of Hong Kong to China. PA1 8. Troops in controversy over design for new china. PA1 9. Troops wear boots made in China. PA1 10. Troops of General Chun put down protest in China.
Of course, only the last item is relevant to the user's intent.