1. Field of the Invention
The present invention relates to the problem of modeling the scoring or demand curves for large sets of objects (products or downloads), in cases where the demand behavior is known to exhibit the Long Tail phenomenon. This is particularly relevant in the context of Internet-based commerce, where various businesses have been experimentally proven to behave that way. In particular, the method concentrates on the problem of predicting the full scoring curve using incomplete information. The method works with the scoring values of just a few (reference) objects, plus some quantified measure of similarity between all the objects.
2. Description of the Background
The concept of a “long tail” distribution has been commonly used in diverse fields, like statistics and physics, to refer to phenomena in which the distribution of a magnitude is shown to exhibit a power-law decay as the magnitude approaches very large values. For the purposes of this discussion, power-law decaying distributions are special mainly because of the much slower rate of decay as compared with Gaussian distributions, for example. However, power laws are also special because they show scale-free behavior, meaning that the shape of the curve can be easily rescaled to fit a common (i.e. “universal”) power law of the type xα. In other words, the exponent α is all that characterizes the distribution curve for large x.
In the context of the new Internet-based economy, the popularization of the concept of “The Long Tail” is attributed to Chris Anderson. In his first article in Wired Magazine (Anderson, “The Long Tail”, Wired, Issue 12.10, October 2004) and then later in his book (Anderson, “The Long Tail: Why the Future of Business Is Selling Less of More” (New York: Hyperion Press 2006)), Anderson shows how, for most of the new big Internet retailers, the demand exhibits a long tail behavior. Note that this actually concerns the demand curve for the universe of items on sale, when these are ordered by sales rank. Although it may be tempting to think of it as a “probability distribution” for the number of sales, this could be misleading and lead to wrong analyses. Notwithstanding a few criticisms (notably, Tan et al., “Is Tom Cruise Threatened? Using Netflix Prize Data to Examine the Long Tail of Electronic Commerce”, July 2009, Wharton, University of Pennsylvania, available at http://opim.wharton.upenn.edu/˜netessin/TanNetessine.pdf), it is widely recognized that the tenets of the theory are experimentally confirmed both for large and small retailers (see Bailey et al., “The Long Tail is Longer than You Think: The Surprisingly Large Extent of Online Sales by Small Volume Sellers, May 13, 2008, available at SSRN: http://ssrn.com/abstract=1132723).
The mechanisms by which the long tail behavior appears are well known: the new era of on-line retail allows businesses to enlarge their product catalog endlessly, because shelf-space costs are nearly zero. Once consumers are offered limitless variety, it is to be expected that the demand curves extend their shape to more and more items. However, the non-obvious aspect of the theory is that the particular shape of the tail is a power-law tail (see FIG. 1). The implications for business models are then clear: an internet business can now monetize the tail of the long tail distribution of the demand. Moreover, the demand in the whole tail can actually add up to a percentage of sales that rivals the head of the curve (see FIG. 2). Today, it is evident that the most successful Internet businesses have been those with the vision and skills to monetize the long tail of the demand (see, for instance, Levy, “In the Plex: How Google thinks, works, and shapes our lives” (Simon & Schuster 2011)).
Therefore, it has become quite important to accurately model and predict the long tail part of a demand curve, in order to optimize the economic value extracted from it. Such modeling enables better quantification of targeted marketing or recommendation system efforts. Although the long tail framework is quite recent, many publications and innovations make use of it in one way or another.
U.S. Patent App. Pub. No. 2007/0294733 by Aaron et al. describes methods for facilitating content-based selection in long tail business models, based on the position of the requested item on a content demand curve.
Another area of interest is that of destroying or minimizing any remaining barriers to a full long tail business; in other words, ensure that the shelving costs remain close to zero. For instance, in U.S. Pat. No. 6,223,205 granted to Harchol-Balter et al., a method is disclosed for assigning tasks in a distributed server system, intended to optimize requests for service in the scenario of heavy tailed distributions. U.S. Pat. No. 7,707,215 granted to Huberman et al. describes a pari-mutuel content provisioning method for peer-to-peer networks, intended to provide a wide diversity of content offerings while responding adaptively to customer demand. Files are served and paid for through a pari-mutuel market (similar to that commonly used for betting in horse races), and it is shown that the system achieves an equilibrium with a long tail in the distribution of content offerings, guaranteeing the real-time provision of any content regardless of its popularity.
U.S. Pat. No. 7,720,933 granted to Gordon et al. discloses an end-to-end data transfer method in which a multi-tiered control system combines the best features of a centralized system and peer-to-peer systems in order to minimize the problems associated with serving “obscure” content (the far end of the long tail distribution, i.e. non-popular or less sold contents). U.S. Patent App. Pub. No. 2010/0332595 by Fullagar et al. also deals with the problems related to handling long tail content in a delivery network. It discloses a method consisting of a hierarchy of servers designed to cache a universe of items with a long tailed demand curve.
U.S. Pat. No. 7,647,332 to Van Flandern et al. shows methods to deal with the problem of content discovery in the context of abundant long tail commerce, in the form of an aggregating interface.
A different set of problems includes those related to the prediction of the scoring of particular items, and the related problem of item similarity. Targeted marketing campaigns and recommendation systems make use of these two key concepts; therefore, they are crucial for the successful exploitation of long tail markets. See for instance, Ardissono et al., “User Modeling and Recommendation Techniques for Personalized Electronic Program Guides”, pp. 3-26 in: Personalized Digital Television, Human-Computer Interaction Series Vol. 6, Eds. Ardissono et al., Springer, Netherlands, 2004. U.S. Pat. No. 6,115,718 granted to Huberman et al. discloses a method for predicting document access in a collection of linked documents featuring link probabilities, which may be interpreted as similarities in other long tail contexts. The method works by simulating a “law of surfing”, and achieves a scoring index that predicts the likelihood of access. U.S. Pat. No. 7,734,641 granted to Kanigsberg et al. discloses a system for recommendations, which is primarily based on the interpretation (using the semantic content of natural language) of user's searches, but also uses the popularity index of the items.
In U.S. Pat. Nos. 7,949,627; 7,885,904; 7,792,815; 7,774,341; 7,657,526; and 7,529,741, all granted to Aravamudan et al., several methods are disclosed to score the contents for each particular user in order to achieve better customized recommendations.
U.S. Patent Appl. Pub. No. 2010/0268661 by Levy et al. discloses a method for building a recommendation system using two supervised learning techniques: categorical training, where recommended items are based upon similar categories; and similar-to related training, where similar items are used to find related items.
In “Factorization meets the neighborhood: a multifaceted collaborative filtering model”, Proc. 14th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'08), pp. 426-434, 2008, Koren advances the art of recommendation systems by merging the two most common approaches for exploiting collaborative filtering, namely factorization (i.e. profiling of users and products) and modeling of “neighborhoods” based on similarity. The author, who tested his methods on the dataset that Netflix™ made available in 2006, recognizes the power of neighborhood methods, as they work only on items and do not need to compare users to items.
A different issue of concern here is the construction of a demand curve a priori, or the related problem of predicting the relative score of a new item in the universe. The method disclosed herein addresses these two issues. One source of inspiration comes from the well-known utility function theorem described in Von Neumann et al., “Theory of Games and Economic Behavior”, Third Ed. (Princeton University Press, 1953), which asserts that there exists a function that is able to reproduce the outcomes of a set of pair-wise preferences between the items in the set. The other comes from the Elo rating system for ranking chess players, a process by which the relative skills between players end up producing a scoring curve that approximates the expected distribution (a Gaussian in this case). See Elo, “The Rating of Chessplayers, Past and Present” (Arco, 1978; Ishi Press reprint, 2008) and Harkness, “Official Chess Handbook” (McKay, 1973). Invented by the Hungarian-born American physicist and chess master Arpad Elo, the Elo method works by exchanging rating values between each two players according to the results of their match, using a precise formula designed to reproduce a Gaussian distribution. After a sufficiently large number of tournaments, the emergent curve of Elo ratings does reproduce the expected distribution. The Elo system was invented as an improved chess rating system, but today it is also used in many other multiplayer games and competitions. Even if statistical tests have shown that chess performance is not exactly normally distributed, the method is used with modified formulas, but still referred to as the Elo system.
There are not many studies directly related to the a priori modeling of the demand curve. U.S. Patent Appl. Pub. No. 2010/0121857 by Elmore et al. discloses an Internet-based method for ranking artists using a popularity profile. It is relevant here because it is a method that turns dispersed information about preferences in popularity into a unified score that allows a ranking of all artists. In “Recommendation Networks and the Long Tail of Electronic Commerce”, Sep. 1, 2010, available at SSRN: http://ssrn.com/abstract=1324064, Oestreicher-Singer et al. describe an approach to the study of the long tail demand curve from an interesting perspective: they analyze the effect of an existing system (recommendation networks) on the flattening of the curve. Alternatively, in “Open Mobile Platforms: Modeling the Long-Tail of Application Usage”, Fourth International Conference on Internet and Web Applications and Services, IEEE, pp. 112-118, May 2009, Verkasalo studies the modeling of the long tail demand curve for smart-phone applications, although from an empirical point of view.