1. Field of the Invention
The present invention relates generally to a method and system for creating a personalized display for a user of an electronic network. More specifically, the present invention relates to a method and system for determining a user's interests from the content of electronic documents viewed by the user and providing recommended documents and recommendation packages to a user based upon the determined interests.
2. Description of Related Art
The number of Internet users continues to increase at an explosive rate. The World Wide Web (“Web”) has therefore now become a significant source of information, as well as products and services.
As the numbers of Web users rise, Internet commerce, also referred to as “e-commerce” companies, and content providers are increasingly searching for strategies to target their information, products and services to those Web users. One technique that is currently being used to provide Web users with more relevant and timely information is “personalization.”
Personalization can include sending a user an e-mail message tailored to that user, or providing customized Web pages that display information selected by, or considered of interest to the user. Personal merchandising, in which a unique view of an online store, featuring offerings targeted by customer profile is displayed, is another effective personalization technique. Personalization facilitates the targeting of relevant data to a select audience and can be a critical factor in determining the financial success of a Web site.
Internet companies wishing to create highly personalized sites are currently poorly served by both personalization technology vendors and customer relationship marketing product vendors. Each of these vendors offers only part of the overall solution. In addition, a significant investment of time and resources by the client is required to deploy these current solutions.
Most prior art personalization and Web user behavior (also known as clickstream) analysis technologies maintain a record of select Web pages that are viewed by users. This record, known as the “Web log” records which users looked at which Web pages in the site. A typical Web log entry includes some form of user identifier, such as an IP address, a cookie ID or a session ID, as well as the Uniform Resource Locator (“URL”) the user requested, e.g. “index.html.” Additional information such as the time the user requested the page or the page from which the user linked to the current Web page can also be stored in the Web log.
Traditionally, such data has been collected in the file system of a Web server and analyzed using software, such as that sold by WebTrends and Andromedia. These analyses produce charts displaying information such as the number of page requests per day or the most visited pages. No analysis is performed of the internal Web page structure or content. Rather, this software relies on simple aggregations and summarizations of page requests.
The prior art personalization methods also rely on the use of Web logs. One technology used in prior art personalization methods is the trend analysis method known as collaborative filtering. Examples of collaborative filtering systems are those of Net Perceptions (used for Amazon.com's book recommendations), Microsoft's Firefly, Personify, Inc., and HNC Software Inc.'s eHNC.
One method of collaborative filtering is trend analysis. In trend analysis collaborative filtering, the pages requested by a user are noted, and other users that have made similar requests are identified. Additional Web pages that these other users have requested are then recommended to the user. For example, if User A bought books 1 and 2 from an on-line bookseller, a collaborative filtering system would find other users who had also bought books 1 and 2. The collaborative filtering system locates 10 other users who on average also bought books 3 or 4. Based upon this information, books 3 and 4 would be recommended to User A.
Another type of collaborative filtering asks the users to rank their interest in a document or product. The answers to the questions form a user profile. The documents or products viewed by other users with a similar user profile are then recommended to the user. Systems using this technique include Reel.com's recommendation system.
However, collaborative filtering is not an effective strategy for personalizing dynamic content. As an example, each auction of a Web-based auction site is new and therefore there is no logged history of previous users to which the collaborative filtering can be applied. In addition, collaborative filtering is not very effective for use with infrequently viewed pages or infrequently purchased products.
Another technique used to personalize Internet content is to ask the users to rank their interests in a document. Recommendations are then made by finding documents similar in proximity and in content to those in which the user has indicated interest. These systems may use an artificial intelligence technique called incremental learning to update and improve the recommendations based on further user feedback. Systems using this technique include SiteHelper (Ngu and Wu, 1997), Syskill & Webert (Pazzani et al., 1996) Fab (Balabanovic, 1997), Libra (Mooney, 1998) and Web Watcher (Armstrong et al., 1995).
Another technique that has been used to personalize Internet content is link analysis. Link analysis is used by such systems as the search engine Direct Hit and Amazon.com's Alexa®. The prior art link analysis systems are similar to the trend analysis collaborative filtering systems discussed previously. In the link analysis systems, however, the URL of a web page is used as the basis for determining user recommendations.
Other prior art personalization methods use content analysis to derive inferences about a user's interests. One such content analysis system is distributed by the Vignette Corporation. In the content analysis method, pages on a client's Web site are tagged with descriptive keywords. These tags permit the content analysis system to track the Web page viewing history of each user of the Web site. A list of keywords associated with the user is then obtained by determining the most frequently occurring keywords from the user's history. The content analysis system searches for pages that have the same keywords for recommendation to the user.
This prior art content analysis systems is subject to several disadvantages. First, tagging each page on the client's Web site requires human intervention. This process is time-consuming and subject to human error. The prior art content analysis systems can only offer recommendations from predefined categories. Furthermore, the prior art content analysis systems require a user to visit the client's Web site several times before sufficient data has been obtained to perform an analysis of the user's Web page viewing history.
Other prior art content analysis systems automatically parse the current document and represent it as a bag of words. The systems then search for other similar documents and recommend the located documents to the user. Such systems include Letizia (Lieberman, 1995) and Remembrance Agent (Rhodes, 1995). These content analysis systems base their recommendations only on the current document. The content of the documents in the user's viewing history are not used.
Many Web sites offer configurable start pages for their users. Examples of configurable start pages include My Yahoo! and My Excite. To personalize a start page using the prior art method, the user fills in a form describing the user's interests. The user also selects areas of interest from predefined categories. The user's personalized start page is then configured to display recommendations such as Web pages and content-based information that match the selected categories.
This prior art method, however, is not automated. Rather, the user's active participation is required to generate the personalized Web start page. Furthermore, pages on the client's Web site must be tagged to be available as a recommendation to the user. In addition, recommendations can only be offered from predefined categories. Thus, the prior art personalized start pages may not provide relevant content to users who have eclectic interests or who are not aware of or motivated to actively create a personalized start page.
Content Web sites are increasingly generating income by using advertising directed at users of the Web sites. In the prior art, advertising was targeted to users by using title keywords. In this method, keywords in the title of a Web page or otherwise specified by the author of the page are compared with the keywords specified for a particular advertisement. Another technique used is to associate specific ads with categories in a Web site. For example, advertisements for toys might be associated with Web site categories related to parenting. However, these prior art methods require human intervention to select the keywords or to determine the associations of advertisements with particular categories. Furthermore, the prior art methods cannot readily be used to target advertisements to dynamic content.
It would therefore be an advantage to provide a method and system for providing Internet end users with relevant and timely information that is rapid to deploy, easy and inexpensive for client Web sites to use. It would be a further advantage if such method and system were available to automatically and dynamically determine the interests of a user and recommend relevant content to the user. It would be yet another advantage if such method and system were available to provide for a user a personalized recommendation package, such as an automatically generated start page for each user who visits a Web site.