Recommendation systems provide a discovery experience for shoppers and users. There are two major types of conventional recommendation systems: collaborative filtering based systems and content based (CB) systems. Collaborative filtering (CF) depends on actual user events, for example a user consuming (e.g., buying/watching/reading) an item. CF systems may tell a user that “people who saw A also tend to see B and C.” Content based systems describe features (e.g., author, actor, genre) of items. Content based systems may also depend on actual user events. For example, content based systems may tell a user that “this movie has features like this other movie you watched.” Different techniques may be used to compute item similarities and then to provide recommendations based on the similarities. The quality of a content based similarity recommendation varies directly with the quality of data describing the feature. Since data about features may be available in different languages and different dialects and may have different or even dubious quality, some content based similarity recommendations may have questionable value.
Conventional recommendation systems provide information about matches between users (e.g., shoppers) and items (e.g., books, videos, games) or between items and items based on user interests, preferences, history, item features, or other factors. For example, if a system has data that a user has previously accessed a set of items, then a recommendation system may identify similar items and recommend them to the user based on the data about the user's own actions (e.g., “if you liked this, you might like that”). This may be referred to as a user-to-item recommendation, a U2I reco, or as a “pick”. If a system has data that one item has features like another item, then a conventional recommendation system may also provide item-to-item recommendations or “related” recommendations (e.g., “this movie has the same actors and subject matter as this other movie”). These recommendations may be referred to as I2I CB recos. The quality of I2I CB recos depends on the quality of the data associated with the features upon which the recommendation decision is made.
In conventional content based recommendations systems, items that can be recommended or considered for recommendation may include metadata tags. The metadata tags may be manually curated to facilitate categorization and similarity estimation. While conventional content based recommendations systems may have provided interesting and relevant results, sub-optimal recommendations may have been provided for different reasons. For example, in some domains the feature vocabulary (e.g., set of curated metadata tags) may be sparse or insufficient. By way of illustration, an application may be tagged under a first category (e.g., Games-puzzle) but may actually be a “match-3” type of game. This “match-3” type of game may inadvertently get clustered with other puzzle games if the feature vocabulary is too sparse to convey the additional categorization. In another example, manually curated labels may be wrong or may have different significance between a tagger and a user of the tag. By way of illustration, a domain may rely on a manual curation of features by experts. The features may be binary in that they either exist in relation to the item or they do not exist in relation to the item. Conventionally there may be little if any validation of these manually curated tags. An incorrect or inaccurate tag for a feature may produce negative consequences (e.g., reduced quality) for similarity measures that consider the feature. In another example, a domain may have textual representations in several languages. The translations may vary in quality or in point of view for describing the content of an item. By way of illustration, an English language tag may have been produced by a native English speaker and then a French language tag may have been produced by the same native English speaker who incorrectly translates the tag. Thus, a single item may have inconsistent and even incorrect tags which once again may produce sub-optimal or even misleading recommendations. In yet another example, a source of a tag for a feature may be prone to bias, abuse or even fraud by developers, curators, or other tag producers or feature annotators. Once again there may be little if any validation of these biased or fraudulent tags.