The present disclosure relates to matrix factorization and more specifically, to methods, systems and computer program products for performing memory-aware matrix factorization.
Recommendation systems are becoming more and more pervasive in Internet applications such as music sharing, e-commerce, and on-demand Internet streaming media. Moreover, recommendation systems can be combined with other applications, like ranking and filtering, to develop new products in online advertisement and user-centric information retrieval. A common technique used in recommendation systems is the factorization of a user-item matrix R, whose entries at (u; v) denote a preference of user u on item v. This user-item matrix R is generally a sparse matrix and matrix factorization is used to generate estimated entries for the entries that have null, or zero values. A matrix-factorization based collaborative filter is generally considered one of the best models for recommendation systems.
The problem of matrix factorization is to decompose matrix R into two dense matrixes X and Θ, such that: R≈X·ΘT. Assuming that ru;v is an non-zero element of matrix R at position (u; v), the matrix factorization can be accomplished by the minimization of the following cost function:
                    J        =                                            ∑                              u                ,                v                                      ⁢                                          (                                                      r                                          u                      ,                      v                                                        -                                                            x                      u                      T                                        ⁢                                          θ                      v                                                                      )                            2                                +                      λ            (                                                            ∑                  u                                ⁢                                                      n                                          x                      u                                                        ⁢                                                                                                          x                        u                                                                                    2                                                              +                                                ∑                  v                                ⁢                                                      n                                          θ                      v                                                        ⁢                                                                                                          θ                        v                                                                                    2                                                                        )                                              (        1        )            where xTu and Θv are the uth row of X and the vth column of ΘT, respectively.
The optimization of the above cost function (1) can be done through many classical optimization methods, including alternative least square, coordinate descent and stochastic gradient descent have been applied to solve this problem. The nature of matrix factorization is computation expensive and accordingly, for real-life, industry-scale matrix factorization problems, parallel computing is often used. Parallelizing the optimization problem of matrix factorization is difficult because many classical algorithms for matrix factorization are sequential instead of parallel. There have been a lot of efforts in applying parallel computing methods for matrix factorization, especially in the scenario of shared memory, CPU-based systems. However, such methods suffer from locking, discontinuous memory access and memory hotspots.