1. Field of the Invention
The present invention relates to data mining methods for discovering and quantifying associations between selectable items, and associations between search queries (or other forms of user input) and selectable items. The selectable items may, for example, be products represented in an electronic catalog, documents, web pages, web sites, media files, and/or other types of items for which behavioral associations can be detected.
2. Description of the Related Art
A variety of methods are known for detecting behavior-based associations (i.e., associations based on user behaviors) between items stored or represented in a database. For example, the purchase histories or item viewing histories of users can be analyzed to detect behavior-based associations between particular items represented in an electronic catalog (e.g., items A and B are related because a relatively large number of those who purchased A also purchased B). See, e.g., U.S. Pat. No. 6,912,505. As another example, the web browsing histories of users can be analyzed to identify behavior-based associations between particular web sites and/or web pages. See, e.g., U.S. Pat. No. 6,691,163 and U.S. Pat. Pub. 2002/0198882.
The detected behavior-based associations are typically used to assist users in locating items of interest. For example, in the context of an electronic catalog, when a user accesses and item's detail page, the detail page may be supplemented with a list of related items. This list may, for example, be preceded with a descriptive message such as “people who bought this item also bought the following,” or “people who viewed this item also viewed the following.” The detected associations may also be used to generate personalized recommendations that are based on the target user's purchase history, item viewing history, or other item selections.
It is also known in the art to analyze the search behaviors of users to detect associations between particular search queries and particular items. The detected associations may be used to rank search result items for display, and/or to supplement a search result set with items that do not match the user's search query. For example, when a user conducts a search, the matching items having the strongest behavior-based associations with the submitted search query may be elevated to a more prominent position in the search results listing; in addition, one or more items that do not match the search query, but which have strong behavior-based associations with the search query, may be added to the search result listing. See, e.g., U.S. Pat. No. 6,185,558.
One problem with relying on behavior-based associations is that the quantity of behavioral data collected for a particular item may be insufficient to create behavior-based associations for that item. This may be the case when, for example, new items are added to an electronic catalog, or when new web pages or documents are added to a data repository. Unfortunately, the problem is self perpetuating because popular items (items with behavioral associations) typically remain popular due to their heightened exposure, while new and generally unknown items remain unpopular due to their lack of exposure. This problem is sometimes referred to as the “cold-start” problem.
One possible way to reduce the cold-start problem is to supplement the behavior-based associations with content-based associations between items. For example, a new item (one for which little or no behavioral data exists) can be associated with other items based on similarities between the attributes or other content of the items. These content-based associations may then be used to increase the new item's exposure in the same way behavior-based associations are used.
Unfortunately, content-based associations tend to be less reliable than behavior-based associations, especially if the item content is not highly consistent in format. In addition, content-based associations frequently are not a good predictor of the items users desire to purchase, view or otherwise select in combination, and thus tend to be less useful. As one example, suppose that an electronic catalog system displays lists of related products on product detail pages, with these lists generated automatically based on aggregate purchase histories. In such system, the detail page for a particular product (e.g., a printer) may desirably list products that are very different from, but complementary of, that product, such as commonly purchased accessories for the product (e.g., an ink cartridge for the printer). If content-based associations were used in place of the behavior-based associations, however, these complementary products likely would not appear since their attributes would typically be dissimilar to those of the featured product.