1. Technical Field
The invention relates to electronic access to information. More particularly, the invention relates to a method and apparatus for context-based content recommendation.
2. Description of the Prior Art
One problem with finding information in an electronic network concerns how people are connected as quickly and effectively as possible with the information/products/services that meet their needs. This has been one of the main goals of Web pages and search engines since the beginning of the World Wide Web. Failure to do so leads to lost business in the case of eCommerce eTravel, and eMarketing sites, frustrated customers on eSupport sites who likely then call customer support, thus wasting a lot of the company's money, disinterested viewers/reader on eMedia sites who quickly abandon the site, thus losing opportunities for advertising revenue, and unproductive employees on intranets.
Web site design is a manual attempt to solve the problem of information discovery: to organize information in a way that the designer imagines helps a user find what they are looking for. While effective in some cases, trying to find information in this way often is slow and ineffective as users resort to poking around a site looking for the information they need. Most users actually abandon a site if they do not find what they are looking for within three clicks. One problem is that the site is static. In more recent years, Web analytics has emerged as an attempt to alleviate this problem. Designers can see all of the actions that happen on their site and collect it into reports that aim to provide some guidance on how the site can be redesigned or reconfigured more effectively. While providing some benefit, the information provided is often ambiguous and provides only hints rather than concrete suggestions for improvement. At best, the process is tedious, requires a great deal of manual effort as designers redesign the site in line with learnings, and takes a long time. The feedback loop is thus slow and ineffective.
Automatic content recommendation is a completely different strategy that emerged very early in the life of the Web. Search engines, such as Google, Yahoo, and Ask, are the common manifestation of such techniques. The basic idea is that the user explicitly describes what they are looking for in the form of a search query, and an automatic process attempts to identify the piece of content, most often a Web page, that best matches their query. The approach for doing this amounts to looking at all possible documents and recommending those where the target query occurs within the text with highest frequency, i.e. keyword match. Modern adaptations of this basic technique add layers of sophistication, e.g. natural language processing, but the key in these approaches is still to use properties of the content itself, e.g. words within the document, to determine the ultimate relevancy ranking. This represents the first content-centric phase of content recommendation (see FIG. 1).
Many variations to this approach exist including, most notably, meta-tagging. In this approach, the content creator selects a small number of terms to describe the content. These terms are embedded within the content, often as HTML meta-tags, but are not necessarily made visible to the consumer of the content. This is one way to allow search engines to search content that is not text-based, such as video clips. This approach was very common in the late 1990's, but has since fallen out of favor due to the enormous effort required to keep the meta-tags up to date and in-synch with changes to the content.
In many ways, this first content-centric approach on the surface make a lot of sense, i.e. if you want to recommend content, consider the content itself. A key problem with this approach is that it often brings back lots of documents that may be relevant but not useful. Many documents may exhibit a strong keyword match, but are outdated or not truly relevant to the user's current interest. If users do not find a useful result within the first few results, they are most likely going to abandon the search.
Keyword match does not really reflect how we find information most efficiently in the real world. In day-to-day life, the best way to find the information/products/services we are looking for is to ask someone who knows to point us in the right direct. The second phase of content recommendation thus shifts the focus from content to users (see FIG. 1). Google's “PageRank” algorithm, though we place it in phase 1, was really a transitional technology that harkened the coming of phase 2. The page rank algorithm's break-through was to consider not only the content of the page itself, but how it had been linked to from other pages by other Web site designers. This represented a form of voting on the importance of Web pages. Thus, pages that were linked to more often were seen as more valuable. While bringing people into the equation, the people who were voting were Web designers rather than the consumers of the content, i.e. the users. Phase 2 of content recommendation is all about the users. The three most well known approaches that fall into phase 2 are: folksonomy, profiling/behavioral targeting, and collaborative filtering.
Folksonomy
The first, folksonomy, represents the most straight-forward addition to phase 1. Here, users are allowed to tag content themselves. So, rather than the Web site designers, or a single designer, being responsible for coming up with the best set of keywords to describe the content, folksonomy lets the community do it. Once this is done, those community created tags essentially become part of the content and can be searched using traditional information retrieval/search techniques developed in phase 1. A big assumption in this approach is that the subset of the community who takes the time to tag the pages explicitly, ultimately produce a description that is valid and representative of the larger community's opinion. This is often not the case.
Profiling/Behavioral Targeting
Profiling/Behavioral targeting in its common form also borrows heavily from phase 1 techniques. Here, based on a user's prior behavior on a site, e.g. the pages clicked or products purchased, a profile is built for that user. This profile may, in the simple case, be based on a collection of pages clicked or products purchased. The profile may also make use of the content itself or meta-tags to attempt to discern the user's historical topics of interest. For example, if a user purchased many films tagged as “horror” by content providers in the past, then a behavioral targeting system would tend to recommend more “horror” films to the user. A major assumption here is that a user's historical behaviors are a good predictor of future interest. While sometimes true, this assumption tends to fail at least as often as it works. The reason for failure is that people exhibit a variety of behaviors depending on their current interests, context, and goals. For example, someone who bought a few books on guitar as a one-time gift for his wife a few weeks ago, might continue to be recommended guitar books by a behavioral targeting approach, even though he may no longer have interest in that topic. Profiling approaches often also take into account demographic data of users, such as age, gender, and geographic location. The core belief underlying such approaches is: If I only knew enough about a user I could predict exactly what they want. However, some basic introspection uncovers the fallacy underlying this approach. For example, I may know more about my wife than any person or machine. I am in this way the ideal profiling system for her. However, I am unable to predict what she might be currently looking for online without some context.
Collaborative Filtering
Collaborative filtering is another user-centric approach which is arguably the most strictly user-centric. Here, users are compared to one another based on common purchases, click histories, or explicit ratings. For example, based on a person's previous ratings of movies on a movie site, find other people who most agree with that person's ratings and recommend other movies that he liked. Standard “people who bought this also bought that” approaches are actually a variation on the collaborative filtering approach, where a user's most recent action serves as the sole basis for identifying similar users. This approach was made popular by Amazon's recommendation engine. A big assumption in this approach is that some global similarity measure between users based on past behavior is a useful way to predict future interest. This is a flawed assumption, however. One may be very similar to some of his co-workers in a work context, e.g. they are all Java engineers, with similar interests regarding programming, but quite different from these co-workers when outside of the office, on the golf course for instance. In the context of golf, one likely has a very different peer group. Grouping users at a global level is more often misleading than helpful.
Another weakness in all of the user-centric approaches in phase 2 is the reliance on either explicit measures of liking or overly-simplistic implicit measures. Explicit measures include asking the user to indicate their liking of a particular piece of content, e.g. on a 1-5 scale. Such approaches are almost always biased because they represent a very small percentage of the population. Further, the people who are taking the time to do these ratings are not representative of the community as a whole. They tend to be very opinionated or reflect a specific personality type that is willing to spend the time to voice their opinion.
Those approaches that leverage implicit observations as a rule either look at clicks or purchases. Clicks are a flawed way to assess liking because getting someone to click on a result has a lot more to do with an intriguing, perhaps even ambiguous, title and location on page. It tells one nothing about how a user felt about the content once it is selected for viewing. At the other extreme, many systems use purchases as a measure of liking. While purchases are a reasonable way to assess this, they are too limited. For example, when buying a camera, one may seriously consider a number of products before making a decision. All of that information could be valuable to others interested in cameras above and beyond to the one ultimately purchased.