There are different kinds of search services that help the user find relevant Web pages on the Internet. Directories, like Yahoo, use human editorial teams to categorize websites they find into a categorical tree. In nature they are similar to telephone directories, where the user would look up a category such as Auto Repair.
Search engines, like Google, FAST, Inktomi or AltaVista, send their spiders out in an attempt to visit each and every page of each and every web site, and every bit of information they find is indexed. Their index consists of each and every word they extract from documents found by their spiders. A search query submitted by a user is compared against the index and a list of relevant search results in constructed.
A third type of search service enables website owners to manually insert key terms of their choice into the search index. In this type of service, operated for example by companies such as Overture and FindWhat, a search query submitted by a user is compared against an index that consists of a list of key terms. Usually (but not necessarily), a ‘hard match’ is required between the submitted search query to one of the keywords or key-phrases (“Key terms”) in the index in order for the search service to provide the user with search results. Website owners that submit a Web page to such search service, have to find the Key terms that best fit the submitted Web page. For example, the terms ‘Harry Potter’ and ‘book’ could be submitted for a certain page within an online bookstore. Any time these terms are submitted to the search service by a user, the said Web page would probably appear within the search results (depending on the specific ranking algorithms used by the search service). Any other search query submitted by the user would not bring up the said Web page, even if those key terms appeared somewhere within the texts of that Web page. For example, the words ‘Quidditch’ may appear within the body of the text of the said Web page, but this keyword will never be matched by the search service to a user search query as long as the Web site owner does not submit it. The same holds true when a user submits a search query with a spelling mistake, a partial query (which consists of a sub-string of the indexed key terms), a query in which the words do not appear in the same order as is in the indexed key terms, etc. In all such cases (and assuming that no misspellings or other such variations were manually submitted to the index), the search service may not provide the user with search results to the submitted query.
One attempt to increase the utility of search engines, by providing an “intelligent” search for concepts related to the submitted query, is described in U.S. Pat. No. 6,453,315 to Applied Semantics Inc. This reference describes a method for mapping relationships between concepts, so that the closeness in “meaning” between a search query and searchable information is determined. Searchable information which is closest in “meaning” to the query may then be returned as the search result.
One significant drawback of such a method is that “meaning” is both relatively vague and also is not so easily determined, such that the determination of “closest in meaning” is also difficult to determine. The above-referenced patent attempts to determine “meaning” by defining a semantic space of similar or related concepts. These concepts must be predetermined in terms of their relationships and similarity to each other; the key terms can then be mapped to the concepts, for determining “closest in meaning”. Target documents, such as Web pages for example, can then be assigned locations within the semantic space as part of preprocessing, before a search query is submitted. These locations relate to the score of the target documents for particular mapped concepts.
Although this method has the advantage of being capable of a mathematical implementation, and hence of being operated by a computer, it has many disadvantages. In particular, it requires predetermined relationships between concepts to be known before any processing of target documents is possible. In other words, the content of the actual documents must be subordinate to the previously determined conceptual map. Should the content fail to be well expressed or well determined by the conceptual map, then either the map must be redone or the search queries may fail to obtain the most relevant documents. Thus, the above-referenced patent fails to describe a method which may be flexibly adjusted according to the content of the documents.
Targeted advertising also is usually performed when Web site owners submit key terms to search engines as advertising platforms, which enable advertisers to use these platforms to buy traffic to Web sites that attract users who submitted search queries identical or very similar to the advertiser's submitted key terms. The most common business model is the pay-per-click through (PPC) model where the advertiser pays for click-throughs to his Web site. Hereinafter, the term “PPC search engine” refers to any type of search engine that compares a user search query against a list of pre-submitted key terms that are assigned to documents. For example, U.S. Pat. No. 6,269,361 describes a system for allowing a Web site owner to influence the position of their site link or information in search results presented to a user, by purchasing the position and/or paying money to positively influence the location of their web site in the search results.
As noted above with regard to the patent of Applied Semantics, targeted advertising can only be as accurate as the method of targeting. The method described in the patent is rigid, and may also fail when those who are determining the concept mapping do not understand cultural or other differences, for example when attempting to prepare such a map for different countries and/or languages. Thus, clearly an improved method for determining the “meaning” of Web pages and other documents would bring many benefits.