It is becoming increasingly difficult to find information or shop online due to the size and diversity of the Web. In general, it is simply not possible to examine all available alternatives. For instance, on a typical day approximately 13 million individual products are listed on eBay. Searching a repository of this size can be frustrating and unproductive. This problem is exacerbated when the user cannot articulate specific properties of the item that is being sought, or if the user does not know exactly what is being sought, i.e. the user has the ‘I'll know it when I see it’ feeling. Moreover, many users enjoy arriving at the right item serendipitously, a concept difficult to incorporate into the classical notion of information search. Accordingly, effective technologies are needed to help users locate items (i.e. information, products or services).
Currently, there are three commonly employed methods of assisting online users in locating relevant items: search engines, taxonomies and recommender systems. Search engines index documents based on search terms, and are widely used for general Web searches. They allow users who can articulate what they are seeking to find items of that nature reasonably quickly. However, search engines retrieve items based on low-level features (existence of keywords), while people evaluate and use documents based on high-level concepts (such as topic). Furthermore, some research suggests that the algorithms employed by search engines (“Most Site Searches Ineffective”, 2003) are not always accurate (Hawking, 1999), especially site-specific search engines, that is, those that search only the pages or items of a particular website. In addition, search engines require that the user articulate something about the target item, which requires the user to have a target item. Therefore, they are of limited usefulness to users who have a less clear picture of what they are seeking.
Taxonomies are fixed groupings of items based on a predetermined set of categories, and are commonly used to support browsing of online repositories. Taxonomies are appropriate and useful only to the extent that the chosen categories correspond to the way in which users classify the items in question (Parsons and Wand, 1997).
Automated mechanisms have the potential to help users locate the information and/or products they are seeking. Tools that improve the product/user match (i.e. decision quality) without increasing search time or cognitive effort, or that decrease search time without decreasing the product/user match, are particularly valuable. Recommender systems are one such type of tool.
Many online stores use recommender systems to personalize websites (Pine et al., 1999). Recommender systems, which provide item suggestions to users, can overcome some of the difficulties experienced by search engines and taxonomies. They have the potential to infer the high-level concepts relevant to a user and locate relevant items in a search space organized according to concepts extracted from items or item descriptions. (See Deerwester et al., 1990 for more on extracting concepts.) Moreover, recommenders need not rely on classification; they can ignore predefined categories, focusing instead on relevant properties. Recommenders can also enhance e-tailing by converting browsers into buyers, increasing cross-sells and building loyalty. Finally, recommender systems may prove useful for selecting the most relevant content especially when display screens are small, for example, when delivering news to a mobile device (Billsus et al. 2002). Recommenders have been applied in many different contexts including products, services and information.
Most current recommender systems are based on the notion that similar users have similar goals. The most popular method of exploiting this relationship, collaborative filtering (CF), involves recommending to the current user the pre-identified goals of previous, similar users. The known goals of previous users are domain-dependent and may be operationalized on an information-oriented site as the last visited page, or on an e-commerce site as products purchased. Recommenders use a variety of user-to-user similarity measures, but most build a two-dimensional ratings matrix with item on one dimension and user on the other.
Most CF-based systems require explicit user ratings and a large quantity of usage history to function effectively. Further, those using implicit ratings use primitive heuristics to estimate them. CF suffers from several limitations including sparsity, the cold start problem, the first rater problem, scalability and explicit ratings. Most users will rate only a small portion of a large set of items which makes the ratings matrix very sparse. Nearest neighbor algorithms (Herlocker et al., 1999) require a coincidence of ratings to produce user matches. That is, for two users to have a similarity, they must have both rated some set of products. The sparsity problem causes degradation in accuracy and coverage (Konstan et al., 1997 and Sarwar et al., 1998). Without sufficient ratings, the CF algorithm cannot find highly correlated users in many instances.
When a CF-based system is first used, a coldstart period begins in which the ratings matrix is empty (i.e. recommendation is impossible) or extremely sparse (i.e. recommendation quality is extremely low). Similarly, the first rater problem occurs when a new item is added—it cannot be recommended because no one has rated it. Proposed solutions to these problems involve using item-to-item similarity. In addition, the computational complexity of nearest neighbor algorithms increases with the number of products and the number of customers, limiting the scalability of such systems.
Further, most CF implementations force users to engage in the obtrusive and time-consuming task of rating items (Perkowitz et al., 2000), which may deter potential users. In many contexts, people cannot or will not explicitly rate a sufficient number of items, and even when rating sparsity is not a problem, explicitly expressed ratings may suffer from self-reporting bias. Further, questions are obtrusive and users may be reluctant to provide feedback or may provide false feedback. One solution to this problem involves using clickstream data (i.e. navigation patterns) instead of ratings (Mobasher et al., 2002).
CF-based systems that find items similar to an example item are exploiting item-to-item similarities to make recommendations. One use of such systems is to increase cross-sells, but a more sophisticated application involves the construction of a pseudo-item, the ideal item for this customer. This pseudo item is operationalized as a vector in the same format as item vectors, see Deerwester et al. (1990). This representation can be compared to representations of real items using the techniques described by Deerwester. The item-to-item system then recommends the item(s) closest to the pseudo-item. Latent Semantic Indexing (LSI) (Deerwester et al., 1990) is one technology capable of uncovering the latent semantic relationships among documents based only on their keywords.
LSI works by constructing vectors that represent documents and using measures of distances between vectors to indicate the similarity of the corresponding documents. First, each document in a corpus is reduced to a vector of keyword frequencies. After using singular value decomposition to reduce dimensionality, the similarity between any documents can be measured by a function of the angle between their vectors (Deerwester et al., 1990) or a function of the distance between their corresponding points in the solution space. In a document-search context, the pseudo-document vector might be created by taking a weighted average of the vectors of all documents rated so far, with the weights calculated from the ratings. Because LSI extracts conceptual information, it resolves the problems caused by the many-to-many relationship between concepts and keywords, specifically synonymy (two words having a shared meaning) and polysemy (two meanings sharing the same keyword). However, LSI is intended for unstructured data, such as natural language descriptions, and does not effectively use structured data, such as that often associated with online purchasing: price, size, etc.
FindMe systems (Burke, 2000) guide searchers through the search process by using examples. Users discard a series of unsatisfactory items by indicating which aspect of the item is most disappointing through a set of conversational buttons until an acceptable item is found. If, for example, the user indicates, “Too Expensive,” the next example will be similar to the previous, but with a lower value in the cost dimension, if such an example exists. Although FindMe systems can be effective in many situations, they are inherently conspicuous; like a search tool they must be consciously selected and endured, and are therefore not appropriate where transparency is desired (see Burke (1999) and Burke (2000) for more details).