Field of the Invention
The present invention is directed to search systems and methods that facilitates identification and retrieval of additional electronic documents associated with a reference document provided in a search results page. More specifically, the present invention is directed to a search system and method that uses text function tagging.
Description of Related Art
Electronic searching across large document corpora is one of the most broadly utilized applications on the Internet, and in the software industry in general. Regardless of whether the sources to be searched are a proprietary or open-standard database, a document index, or a hypertext collection, and regardless of whether the search platform is the Internet, an intranet, an extranet, a client-server environment, or a single computer, searching for a few matching texts out of countless candidate texts, is a frequent need and an ongoing challenge for almost any application.
One fundamental search technique is the keyword-index search that revolves around an index of keywords from eligible electronic documents. In this method, a user's inputted query is parsed into individual words (optionally being stripped of some inflected endings), whereupon the words are looked up in the index, which in turn, points to electronic documents or items indexed by those words. Thus, the potentially relevant electronic documents are identified and displayed for the user, generally in a sequenced fashion which is ordered based on supposed relevancy of the document to the search query. This sort of search service, in one form or another, is accessed countless times each day by many millions of computer and Internet users, and it is the basis of the Internet search services provided by Lycos®, Yahoo®, and Google®, used by tens of millions of Internet users daily.
Whenever users perform keyword searches over large document collections, they usually are left wanting more of relevant electronic documents than those identified in the initial search results and displayed on a search engine results page (“SERP). A common method of assisting users in this predicament is to provide hyperlinks associated with each of the particular “hits” (i.e. particular documents identified) on the SERP. Upon selection of this hyperlink, the user interface typically displays another page with additional electronic documents that are supposedly relevant to the document for which the hyperlink was selected. This optional feature is generally implemented by providing a “More like this” or “Similar pages” hyperlink placed immediately next to the hyperlink of each document in the SERP.
Typically, upon selection of such “Similar pages” link, the search engine identifies and displays additional electronic documents that have surface characteristics in common with the document identified for which the “Similar pages” link was selected. The common characteristics between the displayed document and the additional documents may include identical or similar content words in their title, and/or presence of identical or similar keywords submitted in the user's search query. If a topic category is known, the additional documents identified may be required to match the category of the reference document as well.
Many search engines and systems that implement such “Similar pages” feature for identifying and displaying additional documents to the user suffer from identifying documents which are not really relevant to the reference document. In particular, there are problems with recommending documents in an overlapping domain or category, even when they share many similar keywords of a user's search query. For example, an automobile website concerning the history of automobiles, and another website concerning their repair, may both be members of the same general domain of “automobiles,” and may both share common keywords such as “Ford,” “performance,” and “engine,” despite one being completely irrelevant to users' interests in the other sub-domain. Thus, the actual relevance of the additional documents identified and displayed may be very low, and not very helpful to the user.
Therefore, there exists an unfulfilled need for a search system and method that addresses the above identified limitations of conventional search engines and “Similar pages” feature used in such conventional search engines. In particular, there exists an unfulfilled need for a search system that recommends to the user, additional documents which are relevant to the main document for which a “Similar pages” link is selected, rather than recommending irrelevant documents.