The present application relates generally to an improved data processing apparatus and method and more specifically to mechanisms for advanced search-term disambiguation.
The Internet is a global network of computers and networks joined together by means of gateways that handle data transfer and the conversion of messages from a protocol of the sending network to a protocol used by the receiving network. On the Internet, any computer may communicate with any other computer with information traveling over the Internet through a variety of languages, also referred to as protocols. The set of protocols used on the Internet is called transmission control protocol/Internet Protocol (TCP/IP).
The Internet has revolutionized communications and commerce, as well as being a source of both information and entertainment. With respect to transferring data over the Internet, the World Wide Web environment, also referred to simply as “the Web,” is used. The Web is a mechanism used to access information over the Internet. In the Web environment, servers and clients effect data transaction using the hypertext transfer protocol (HTTP), a known protocol for handling the transfer of various data files, such as text files, graphic images, animation files, audio files, and video files.
On the Web, the information in various data files is formatted for presentation to a user by a standard page description language, the hypertext markup language (HTML). Documents using HTML are also referred to as Web pages. Web pages are connected to each other through links or hyperlinks. These links allow for a connection or link to other Web resources identified by a universal resource identifier (URI), such as a uniform resource locator (URL).
A browser is a program used to look at and interact with all of the information on the Web. A browser is able to display Web pages and to traverse links to other Web pages, Resources, such as Web pages, are retrieved by a browser, which is capable of submitting a request for the resource. This request typically includes an identifier, such as, for example, a URL As used herein, a browser is an application used to navigate or view information or data in any distributed database, such as the Internet or the World Wide Web.
Given the amount of information available through the World Wide Web, search engines have become valuable tools for finding content that is relevant to a given user. A search engine is a software program or Web site that searches a database and gathers and reports information that contains or is related to specified terms. However, given the vast amount of information on the Internet, search results often include millions, or even tens of millions, of matching files, which are referred to as “hits.” Many of these hits may be irrelevant to the user's intended search. For example, if a user were to request a search of the term “mercury,” the results could include hits related to the element, the automobile manufacturer, the record label, the Roman god, the NASA manned spaceflight project, or some other category.
Once solution to this problem is to include more terms in the search request to disambiguate the search. In the above example, the user may refine the search to include “mercury AND car.” However, it is up to the user to determine which terms to add to refine the search.
One high tech solution is to use a clustering search engine, which groups results of the search into clusters. These search engines are metasearch engines, which send user requests to several other search engines and/or databases and return the results from each one. They allow users to enter their search criteria only one time and access several search engines simultaneously.
A cluster is a group of similar topics that are related to the original query. The clusters are presented to the user through folders. The aim of this search engine technique is to organize numerous search results into several meaningful categories (clusters). The user gets an overview of the available themes or topics. Via one or two clicks on a folder and/or subfolders, the user may arrive at relevant search results that would be too far down in the ranking of a traditional search engine. In addition, the user may view similar results together in folders rather than scattered throughout a seemingly arbitrary list. While clustering search engines organize results into categories, these categories are naïve of the intention of the user. Given only a search query, no one category can be given a higher relevancy than any other. In addition, the algorithm used by a typical clustering engine produces human readable category names that may often be ambiguous themselves.