1. Field of the Invention
Invention relates to a method and system for recommending relevant items to a user of an electronic network. More particularly, the present invention relates to a means of analyzing the text of documents of interest and recommending a set of documents with a high measure of statistical relevancy.
2. Description of the Related Art
Most personalization and web user analysis (also known as “clickstream”) technologies work with the system making a record of select web pages that a user has viewed, typically in a web log. A web log entry records which users looked at which web pages in the site. A typical web log entry consist of two major pieces of information, namely, first, some form of user identifier such as an IP address, a cookie ID, or a session ID, and second, some form of page identifier such as a URL, file name, or product number. Additional information may be included such as the page the user came from to get to the page and the time when the user requested the page. The web log entry records are collected in a file system of a web server and analyzed using software to produce charts of page requests per day or most visited pages, etc. Such software typically relies on simple aggregations and summarizations of page requests rather than any analysis of the internal page structure and content.
Other personalization software also relies on the concept of web logs. The dominant technology is collaborative filtering, which works by observing the pages of the web site a user requests, searching for other users that have made similar requests, and suggesting pages that these other users requested. For example, if a user requests pages 1 and 2, a collaborative filtering system would find others who did the same. If the other users on the average also requested pages 3 and 4, a collaborative system would offer pages 3 and 4 as a best recommendation. Other collaborative filtering systems use statistical techniques to perform frequency analysis and more sophisticated prediction techniques using methods such as neural networks. Examples of collaborative filtering systems include NETPERCEPTIONS™, LIKE MINDS™, and WISEWIRE™. Such a system in action can be viewed at AMAZON.COM™.
Other types of collaborative filtering systems allow users to rank their interest in a group of documents. User answers are collected to develop a user profile that is compared to other user profiles. The document viewed by others with the same profile is recommended to the user. This approach may use artificial intelligence techniques such as incremental learning methods to improve the recommendations based on user feedback. Systems using this approach include SITEHELPER™, SYSKILL & EBERT™, FAB™, LIBRA™, and WEBWATCHER™. However collaborative filtering is ineffective to personalize documents with dynamic or unstructured content. For example, each auction in an auction web site or item offered in a swap web site is different and may have no logged history of previous users to which collaborative filtering can be applied. Collaborative filtering is also not effective for infrequently viewed documents or offerings of interest to only a few site visitors.
Clearly, there is a need for a system that considers not only the identifiers of the pages the user viewed but also the words in the pages viewed in order to make more focused recommendations to the user. Broadening the concept of pages to documents in general, there is a need for a recommendation system that analyzes the words in the document a user has expressed interest in. Such a recommendation system should support options of residing in the same computer as the web site, or on a remote server, or on an end user's computer. Furthermore, the system should be able to access documents from external sources such as from other web sites throughout the Internet or from private networks. A flexible recommendation system should also support a scalable architecture of using a proprietary text search engine or leverage off the search engines of other web sites or generalized Internet-wide search engines.