1. Field of the Invention
The present invention is directed to a computer-implemented product for locating and connecting to a particular desired object or target resource from among plural resources resident at distributed locations on a network.
2. Description of the Related Art
The worldwide network of computers known as the Internet evolved from military and educational networks developed in the late 1960's. Public interest in the Internet has increased of late due to the development of the World Wide Web (hereinafter, the Web), a subset of the Internet that includes all connected servers offering access to hypertext transfer protocol (HTTP) space. To navigate the Web, browsers have been developed that give a user the ability to download files from Web pages, data files on server electronic systems, written in HyperText Mark-Up Language (HTML). Web pages may be located on the Web by means of their electronic addresses, known as Uniform Resource Locators (URLs).
A URL uniquely identifies the location of a resource (web page) within the Web. Each URL consists of a string of characters defining the type of protocol needed to access the resource (e.g., HTTP), a network domain identifier, identification of the particular computer on which the resource is located, and directory path information within the computer's file structure. The domain name is assigned by Network Solutions Registration Services after completion of a registration process.
While the amount of information available on the Web is enormous, and therefore potentially of great value, the sheer size of the Web makes the search for information, and particular web sites or pages, a daunting task. Search engines have been developed to assist persons using the Web in searching for web pages that may contain useful information.
Search engines fall into two major categories. In search engines falling into the first category, a service provider compiles a directory of Web sites that the provider's editors believe would be of interest to users of the service. The Yahoo site is the best known example of such a provider. Products in this category are not, strictly speaking, search engines, but directories, and will be referred to hereinafter as “editor-controlled directories”. In an editor-controlled directory, the developer of the directory (the “editor”) determines, based upon what it believes users want, what search terms map to what web pages.
The other major category, exemplified by Altavista, Lycos, and Hotbot, uses search programs, called “web crawlers”, “web spiders”, or “robots”, to actively search the Web for pages to be indexed, which are then retrieved and scanned to build indexes. Most commonly, this is done by processing the full text of the page and extracting words, phrases, and related descriptors (word adjacencies, frequencies, etc.). This is often supplemented by examining descriptive information about the Web document contained in a tag or tags in the header of a page. Such tags are known as “metatags” and the descriptive information contained therein as “metadata”. These products will be referred to hereinafter as “author-controlled search engines,” since the authors of the Web documents themselves control, to some extent, whether or not a search will find their document, based upon the metadata that the author includes in the document.
Each type of product has its disadvantages. Author-controlled search engines tend to produce search results of enormous size. However, they have not been reliable in reducing the large body of information to a manageable set of relevant results. Further, web site authors often attempt to skew their site's position in the search results of author-controlled search engines by loading their web site metatags with multiple occurrences of certain words commonly used in searches.
Editor-controlled directories are more selective in this regard. However, because conventional editor-controlled directories do not actively search the web for matches to particular search terms, they may miss highly relevant web sites that were not deemed by the editors to be worthy of inclusion in the directory. Also, it is possible for the editor to “play favorites” among the multitude of Web documents by mapping certain Web documents to more search terms than others.
Recently, search engines such as DirectHit (www.directhit.com) have introduced feedback and learning techniques to increase the relevancy of search results. DirectHit purports to use feedback to iteratively modify search result rankings based on which search result links are actually accessed by users. Another factor purportedly used in the DirectHit service in weighting the results is the amount of time the user spends at the linked site. The theory behind such techniques is that, in general, the more people that link on a search result, and the longer the amount of time they spend there, the greater the likelihood that users have found this particular site relevant to the entered search terms.
Accordingly, such popular sites are weighted and appear higher in subsequent result lists for the same search terms. The Lycos search engine (www.lycos.com) also uses feedback, but only at the time of crawling, not in ranking of results. In the Lycos search engine, as described in U.S. Pat. No. 5,748,954, priority of crawling is set based upon how many times a listed web site is linked to from other web sites. This idea of using information on links to a page was later exploited by the Clever system developed in research by IBM, and the Google system (www.google.com), which do use such information to rank possible hits for a search query.
Even leaving aside the drawbacks discussed above, search engines of both categories are most useful when a user desires a list of relevant web sites for particular search terms. Often, users wish to locate a particular web site but do not know the exact URL of the desired web site. Conventional search engines are not the most efficient tools for doing this.
Moreover, naming and locating particular sites on the Web is currently subject to serious problems. For example, appropriate names, including existing company names or trademarks, may not be available, because someone registered them first. Names may be awkward and not obvious, because of length, form/coding difficulties or variant forms, and names may not justify a separate domain name registration for reasons of cost and convenience, such as movie titles or individual products.
This problem results from a mismatch between the present network addressing scheme based on Uniform Resource Locators (URLs), which meet the technical needs of the Internet software, and the needs of human users and site sponsors for simple, user-friendly mnemonic and branded names. This problem is largely hidden in cases where a user finds a site by clicking a pre-coded link (such as after using a search engine), or by using a saved bookmark. However, the problem does seriously affect users wishing to find a site directly, or to tell another person how to find it. To do this, the person must know and type the URL into his Internet browser, typically of the form sitename.com or www.sitename.com. Site sponsors are also seriously hampered by this difficulty in publicizing their sites.
Further, the current method of naming and locating Web sites has serious, widely known problems. Web site locator “domain” names are often not simple or easily remembered or guessed, and often do not correspond to company, trademark, brand or other well-known names.
As a result of the foregoing, site URLs (or domain names) are not intuitively obvious in most cases, and incorrect access attempts waste time and produce cryptic error messages that provide no clue as to what the correct URL might be. A significant percentage of searches are for specific, well-known sites. These could be found much more quickly by a special-purpose locator engine. The current mode of interacting with search engines is also cumbersome-for this purpose, a much simplified mode of direct entry is practical.
One attempt to provide the ability to map a signifier, or alias, to a specific URL utilizes registration of key words, or aliases, which when entered at a specified search engine, will associate the entered key word with the URL of the registered site. One such commercial implementation of this technique is known as NetWord (www.netword.com). However, the NetWord aliases are assigned on a registration basis, that is, owners of web sites pay NetWord a registration fee to be mapped to by a particular key word. As a result, the URL returned by NetWord may have little or no relation to what a user actually would be looking for. Another key word system, RealNames (www.realnames.com), similarly allows web site owners to register, for a fee, one or more “RealNames” that can be typed into browser incorporating RealNames' software, in lieu of a URL. Since RealNames also is registration based, there is no guarantee that the URL to which is user is directed will be the one he intended.
Further, in existing preference learning and rating mechanisms, such as collaborative filtering (CF) and relevance feedback (RF), the objective is to evaluate and rank the appeal of the best n out of m sites or pages or documents, where none of the n options are necessarily known to the user in advance, and no specific one is presumed to be intended. It is a matter of interest in any suitable hit, not intent for a specific target. Results may be evaluated in terms of precision (whether “poor” matches are included) and recall (whether “good” matches are not included).
A search for “IBM” may be for the IBM Web site, but it could just as likely be for articles about IBM as a company, or articles with information on IBM-compatible PCs, etc. Typical searches are for information about the search term, and can be satisfied by any number of “relevant” items, any or all of which may be previously unknown to the searcher. In this sense there is no specific target object (page, document, record, etc.), only some open ended set of objects which may be useful with regard to the search term. The discovery search term does not signify a single intended object, but specifies a term (which is an attribute associated with one or more objects) presumed to lead to any number of relevant items. Expert searchers may use searches that specify the subject indirectly, to avoid spurious hits that happen to contain a more direct term. For example, searching for information about the book Gone With The Wind may be better done by searching for Margaret Mitchell, because the title will return too many irrelevant hits that are not about the book itself (but may be desired for some other task).
In other words, the general case of discovery searching that typical search engines are tuned to serve is one where a search is desired to return some number, n, of objects, all of which are relevant. A key performance metric, recall, is the completeness of the set of results returned. The case of a signifier for an object, is the special case of n=1. Only one specific item is sought. Items that are not intended are not desired—their relevance is zero, no matter how good or interesting they may be in another context. The top DirectHit for “Clinton” was a Monica Lewinsky page. That is probably not because people searching for Clinton actually intended to get that page, but because of serendipity and temptation—which is a distraction, if what we want is to find the White House Web site.
In addition,                CF obtains feedback from a group of users in order to serve each given user on an overall, non-contingent basis—without regard to the either the intent of the user at a specific time, or to being requested in a specific context.        RF is used by a single user to provide feedback on their intent at a given time, but still with no presumed intent of a single target.        
More broadly, searching techniques are generally not optimized based on using a descriptor which is also an identifier—they provide more generally for the descriptor to specify the nature of the content of the target, not its name. There are options in advanced search techniques which allow specification that the descriptor is actually an identifier, such as for searching by title. Such options may be used to constrain the search when a specific target happens to be intended, but no special provision is made to apply feedback to exploit that particular relationship or its singularity.
Moreover, none of the currently available key word systems utilize heuristic techniques actually to determine the site intended by the user. Instead, the current systems teach away from such an approach by their use of registration, rather than user intention, to assign key words to map to web pages. Thus, the current techniques are not directed to solving the problem of finding the one, correct site for a particular signifier.
Thus, the need exists for a system that would enable a user to find a desired Web document by simply entering an intuitive key word or alias and that would perform a one to one mapping of the alias with the URL actually desired by the user, and which would use heuristic techniques to assist in providing the correct mapping, and improving system accuracy over time.