1. Field of the Invention
The present invention is directed to an improved method and apparatus for the utilization of user feedback particularized to a specified or inferred task, to improve the ability to respond accurately to user commands.
2. Description of the Related Art
The development of the World Wide Web (hereinafter, the Web), a subset of the Internet that includes all connected servers offering access to Hypertext Transfer Protocol (HTTP) space, has greatly increased the popularity of the Internet in recent years. To navigate the Web, browsers have been developed that enable a user of a client computer connected to the Internet to download Web pages (i.e., data files on server electronic systems) written in HyperText Mark-Up Language (HTML). Web pages may be located on the Web by means of their electronic addresses, known as Uniform Resource Locators (URLs), which uniquely identify the location of a resource (web page) within the Web. Each URL consists of a string of characters defining the protocol needed to access the resource (e.g., HTTP), a network domain name, identification of the particular computer on which the resource is located, and directory path information within the computer's file structure. The domain name is assigned by Network Solutions Registration Services after completion of a registration process.
Search engines have been developed to assist persons using the Web in searching for web pages that may contain useful information. One type of search engine, exemplified by Altavista, Lycos, and Hotbot, uses search programs, called “web crawlers”, “web spiders”, or “robots”, to actively search the Web for pages to be indexed, which are then retrieved and scanned to build indexes. Most often this is done by processing the full text of the page and extracting words, phrases, and related descriptors (word adjacencies, frequencies, etc.). This is often supplemented by examining descriptive information about the Web document contained in a tag or tags in the header of a page. Such tags are known as “metatags” and the descriptive information contained therein as “metadata”. Another type of search engine, exemplified by Yahoo (www.yahoo.com), does not use web spiders to search the web. Instead, these search engines compile directories of web sites that editors deem to be of interest to the users of the service and the search is performed using only the editor-compiled directory or directories. Both types of search engines output a listing of search results believed to be of interest to the user, based upon the search term or terms that the user input to the engine.
Recently, search engines such as DirectHit (www.directhit.com) have introduced feedback and learning techniques to increase the relevancy of search results. DirectHit purports to use feedback to iteratively modify search result rankings based on which search result links are actually accessed by users. Another factor purportedly used in the DirectHit service in weighting the results is the amount of time the user spends at the linked site. The theory behind such techniques is that, in general, the more people that link on a search result, and the longer the amount of time they spend there, the greater the likelihood that users have found this particular site relevant to the entered search terms. Accordingly, such popular sites are weighted and appear higher in subsequent result lists for the same search terms.
The Lycos search engine (www.lycos.com) also uses feedback, but only at the time of crawling, not in ranking of results. In the Lycos search engine, as described in U.S. Pat. No. 5,748,954, priority of crawling is set based upon how many times a listed web site is linked to from other web sites. The Google search engine (www.google.com) and IBM's Clever system use such information to rank possible hits for a search query.
Two of the important techniques available to assist in locating desired Web resources will be referred to hereinafter as discovery searching and signifier mapping. In discovery searching, a user desires all, or a reasonable number of, web sites highly relevant to entered search terms. In such searching, the criterion for a successful search is that as many of the highly relevant web sites as possible be discovered and presented to the user as prominently as possible. In signifier mapping, a user enters a guessed name or signifier for a particular target resource on the Web. The criterion for a successful signifier mapping is that the user is provided with the URL of, or connected to, the specific target resource sought.
One attempt to provide the ability to map a signifier, or alias, to a specific URL utilizes registration of key words, or aliases, which when entered at a specified search engine, will associate the entered key word with the URL of the registered site. This technique is implemented commercially by NetWord (www.netword.com). However, the NetWord aliases are assigned on a registration basis, that is, owners of web sites pay NetWord a registration fee to be mapped to by a particular key word. As a result, the URL returned by NetWord may have little or no relation to what a user actually would be looking for. Another key word system, RealNames (www.realnames.com), similarly allows web site owners to register, for a fee, one or more “RealNames” that can be typed into a browser incorporating RealNames' software, in lieu of a URL. Since RealNames also is registration based, there once again is no guarantee that the URL to which is user is directed will be the one he intended.
Related to search techniques are preference learning and rating mechanisms. Such mechanisms have been used, for example, in assessing customer satisfaction or in making recommendations to users based on what customers with similar interests have purchased in the past. In existing preference learning and rating mechanisms, such as collaborative filtering (CF) and relevance feedback (RF), the objective is to evaluate and rank the appeal of the best n out of m sites or pages or documents, where none of the n options are necessarily known to the user in advance, and no specific one is presumed to be intended. It is a matter of interest in any suitable hit, not intent for a specific target. Results may be evaluated in terms of precision (whether “poor” matches are included) and recall (whether “good” matches are omitted).
A search for “IBM” may be for the IBM Web site, but it could just as likely be for articles about IBM as a company, or articles with information on IBM-compatible PCs, etc. Typical searches are for information about the search term, and can be satisfied by any number of “relevant” items, any or all of which may be previously unknown to the searcher. In this sense there is no specific target object (page, document, record, etc.), only some open ended set of objects which may be useful with regard to the search term. The discovery search term does not signify a single intended object, but specifies a term (which is an attribute associated with one or more objects) presumed to lead to any number of relevant items.
Expert searchers may use searches that specify the subject indirectly, to avoid spurious hits that happen to contain a more direct term. For example, searching for information about the book Gone With The Wind may be better done by searching for Margaret Mitchell, because the title will return too many irrelevant hits that are not about the book itself (but may be desired for some other task).
In other words, the general case of discovery searching that typical search engines are tuned to serve is one where a search is desired to return some number, n, of objects, all of which are relevant. A key performance metric, recall, is the completeness of the set of results returned. The case of a signifier for an object, is the special case of n=1. Only one specific item is sought. Items that are not intended are not desired—their relevance is zero, no matter how good or interesting they may be in another context. The top DirectHit for “Clinton” was a Monica Lewinsky page. That is probably not because people searching for Clinton actually intended to get that page, but because of serendipity and temptation—which is a distraction, if what we want is to find the White House Web site.
Many self-contained document search systems, such as Lexis/Nexis and Medline have long exploited semantic metadata, machine-readable information as to the content and type of an associated document available on a network, to enable users to more effectively constrain their searches. Thus in searching for the Times review of Stephen King's new book, a user might explicitly search for “pub-name=Times and content-type=review and author=King.” Search systems have enabled searchers to exploit this explicitly in their query language, and attempts at natural language searching have sought to infer such semantics. However, because of the small user population of such systems, there has been no attempt to utilize feedback to improve search results in such systems.
Further, it has been recognized that different people using the same search terms when searching may expect or desire different results. For example, in the context of discovery searching, it has been postulated that when a man enters the search term “flowers” in a search engine, he is likely to be interested in ordering flowers, whereas when a woman enters the same search term, she is more likely to be seeking information about flowers. Some currently existing search engines, such as DirectHit (www.directhit.com) and GlobalBrain (www.globalbrain.net), purport to take gender and other demographic data, such as country, race, and income, into account in formulating results for searches. However, prior art search techniques such as these do not take into account the type of task/domain the user is working in when deciding what results would be desired, nor do the techniques utilize iterative learning based on experiential data or feedback particularized to the task/domain.
There is therefore a need to provide a method for calibrating the use of feedback in searching and other command-responsive control techniques, such as robot control, so as to correlate accumulated user feedback with the particular task/domain being performed by the user.
There also is a need to develop a technique of using semantic metadata for use in search systems having a large user population to assist in determining the task/domain of the user and then to use feedback specific to that task/domain.