Matrix analysis techniques, e.g., singular value decomposition (SVD), have been widely used in various data analysis applications. An important class of applications is to predict missing elements given a partially observed random matrix. For example, putting ratings of users into a matrix form, the goal of collaborative filtering is to predict those unseen ratings in the matrix.
Various probabilistic models have been used to analyze the matrices. Let Y be a p×m observational matrix and T be the underlying p×m noise-free random matrix. The system assumes that Yi,j=Ti,j+εi,j,εi,j˜N(0,σ2), where Yi,j denotes the (i,j)-th element of Y. If Y is partially observed, then Y∥ denotes the set of observed elements and ∥ is the corresponding index set.
One model called the Probabilistic Principal Component Analysis (PPCA) assumes that yj, the j-th column vector of Y, can be generated from a latent vector vj in a k-dimensional linear space (k<p). The model is defined asyj=WVj+μ+εj, vj˜Nk(vj;0,Ik),where εj˜Np(εj;0,σ2Ip), and W is a p×k loading matrix. By integrating out vj, the system obtains the marginal distribution yj˜Np(yj; μ, WWT+σ2Ip). Since the columns of Y are conditionally independent, PPCA is equivalent toYi,j=Ti,j+εi,j, T˜Np,m(T;0,S,Im),where S=WWT, and Np,m(•;0,S,Im) is a matrix-variate normal prior with zero mean, covariance S between rows, and identity covariance Im between columns. PPCA aims to estimate the parameter W in order to obtain the covariance matrix S. There is no further prior on W.
Another model called the Gaussian Process Latent-Variable Model (GPLVM) formulates a latent-variable model in a slightly unconventional way. It considers the same linear relationship from latent representation vj to observations yj. Instead of treating vj as random variables, GPLVM assigns a prior on W and see {vj} as parametersyj=Wvj+εj, W˜Np,k(W;0,Ip,Ik),where the elements of W are independent Gaussian random variables. By marginalizing out W, the system obtains a distribution that each row of Y is an i.i.d. sample from a Gaussian process prior with the covariance R=VVT and V=[v1, . . . , vm]T. The model has the formYi,j=Ti,j+εi,j,T˜Np,m(T;0,Ip,R).From a matrix modeling point of view, GPLVM estimates the covariance between the rows and assume the columns to be conditionally independent.
A third model called Hierarchical Gaussian Process (HGP) is a multi-task learning model where each column of Y is a predictive function of one task, sampled from a Gaussian process prior,yi=tj+εj, tj˜Np(0,S),where εj˜Np(0,σ2Ip). It introduces a hierarchical model where an inverse-Wishart prior is added for the covariance,Yi,j=Ti,j+εi,j, T˜Np,m(0,S,Im), S˜IWp(v,Σ)HGP utilizes the inverse-Wishart prior as the regularization and obtains a maximum a posteriori (MAP) estimate of S.
To predict unobserved elements in matrices, the structures of the matrices play an importance role, for example, the similarity between columns and between rows. Such structures imply that elements in a random matrix are no longer independent and identically-distributed (i.i.d.). Without the i.i.d. assumption, many machine learning models are not applicable.