The invention relates generally information organization and retrieval. More specifically, the invention relates to advertising and document relevance determination.
The Internet, which is a global network of interconnected networks and computers, commonly makes available a wide variety of information through a vehicle known as the World Wide Web (WWW). Currently, hundreds of millions of xe2x80x9cweb sites,xe2x80x9d that house and format such information in documents called web pages are available to users of the Internet. Since the content of such pages is ungoverned, unregulated and largely unorganized between one site and the next, finding certain desired information is made difficult.
To aid users in finding sites or pages having information they desire, search engines were developed. Search engines and directories attempt to index pages and/or sites so that users can find particular information. Typically, search engines are initiated by prompting-a user to type in one or more keywords of their choosing along with connectors (such as xe2x80x9candxe2x80x9d) and delimiters. The search engine matches the keywords with documents or categories in an index that contain those keywords or are indexed by those keywords and returns results (either categories or documents or both) to the user in the form of URLs (Uniform Resource Locators). One predominant web search engine receives submissions of sites and manually assigns them to categories within their directory. When the user types in a keyword, a literal sub-string match of that keyword with either the description of the site in their index or the name of the category occurs. The results of this sub-string search will contain some sites of interest, but in addition, may contain many sites that are not relevant or on point. Though one may refine the search with yet more keywords, the same sub-string match will be employed, but to the result set just obtained. Almost all search engines attempt to index sites and documents and leave it to the user to formulate an appropriate query, and then to eliminate undesired search results themselves. Recently other search engines using natural language queries have been developed but these also often result in many undesired responses.
The quality of the results obtained varies, but by doing essentially sub-string matches or category browsing, the engines are unable to properly discern what the user actually intends or means when a particular keyword is entered. Thus, the response to search terms entered is a list of documents/sites that may bear little relation to the intended meaning/usage of the term(s) entered.
One corollary response to search terms input into search engine is the retrieval and display of advertising icons often referred to as xe2x80x9cbanner ads.xe2x80x9d One ad-buying model for companies and web sites desiring to advertise on a search engine is to purchase one or more search terms. When the search term(s) are input by a user of the search engine, the corresponding banner ad is displayed. Again, because of the limitation of most search engines, the advertiser must purchase several search terms to cover a given concept or meaning that may have multiple equivalent terms associated with them. For instance, a computer manufacturer may have to buy xe2x80x9ccomputer,xe2x80x9d xe2x80x9cPC,xe2x80x9d and xe2x80x9cdesktopxe2x80x9d if they desire to have an ad for computers appear. If the advertiser cannot afford the increase in cost certain equivalent expressions may be entirely missed by the ad campaign. One other factor with banner ads is that related terms or expressions may be completely ignored. For instance, the term xe2x80x9chardwarexe2x80x9d is not equivalent but related to xe2x80x9ccomputerxe2x80x9d and may be a related concept the advertiser desires to capture but cannot unless explicitly purchased.
The responses to a given search term are often based upon the manner in which documents or pointers to documents are indexed in a directory. Internet search engines often index documents and pointers to those documents based upon one or more keywords, which may be embedded within the document, and/or automatically determined by analyzing the document or input manually by a user or reviewer desiring to have the document indexed. Some of these methods of indexing rely on the precision of the terms used and not concepts or meanings.