1. Technical Field
The disclosed embodiments relate to statistical methods that provide predictive modeling based on large scale dyadic data, and more particularly, to modeling that simultaneously incorporates the effect of covariates and estimates local structure that is induced by interactions among the dyads through a discrete latent factor model.
2. Related Art
Predictive modeling for dyadic data is an important data mining problem encountered in several domains such as social networks, recommendation systems, internet advertising, etc. Such problems involve measurements on dyads, which are pairs of elements from two different sets. Often, a response variable yij attached to dyads (i, j) measures interactions among elements in these two sets. Frequently, accompanying these response measurements are vectors of covariates xij that provide additional information which may help in predicting the response. These covariates could be specific to individual elements in the sets or to pairs from the two sets. In most large scale applications, the data is sparse, high dimensional (i.e., large number of dyads), noisy, and heterogeneous; this makes statistical modeling a challenging task. The following real-world example elucidates further.
Consider an online movie recommendation application such as NetFlix, which involves predicting preference ratings of users for movies. This preference rating can be viewed as a dyadic response variable yij; it depends both on the user i and the movie j and captures interactions that exist among users and movies. Since both user and movie sets are large, the number of possible dyads is astronomical. However, most users rate only a small subset of movies, hence measurements (actual ratings provided by a user) are available only for a small fraction of possible dyads. In addition to the known user-movie ratings, there also exists other predictive information such as demographic information about users, movie content and other indicators of user-movie interactions, e.g., is the user's favorite actor part of the movie cast? These predictive factors can be represented as a vector of covariates xij associated with user-movie dyad (i, j). Incorporating covariate information in the predictive model may improve performance in practice. It is also often the case that some latent unmeasured characteristics that are not captured by these covariates induce a local structure in the dyadic space (e.g., spatial correlations induced due to cultural similarities).