Recommendation is a fundamental problem that has gained utmost importance in the modern era of information overload. The goal of recommendation is to help a user find a potentially interesting item from a large repository of items. Recommendation systems are widely used in modern websites in various contexts to target customers and provide them with useful information (for example, Amazon, Google News, Netflix, Last.fm, etc.). A widely used setting of recommendation systems is to predict how a user would rate an item (such as a movie) if only given the past rating history of the users. Many classical recommendation methods have been proposed during the last decade, and the two broad categories of recommendation systems are content filtering approaches and collaborative filtering methods. The collaborative filtering methods have attracted more attention due to their impressive performance. Matrix factorization plays a crucial role in collaborative filtering methods and has emerged as a powerful tool to perform recommendations in large datasets.
Learning effective latent factors plays an important role in matrix factorization based collaborative filtering methods. Traditional matrix factorization methods for collaborative filtering directly learn the latent factors from the user-item rating matrix (i.e., collection of item ratings given by users). One of the main challenges faced by these systems is to provide a rating when a new user or new item arrives in the system, also known as the cold start problem. The cold start problem is circular in nature as the system will not recommend an item unless it has some ratings for it, and unless the system recommends the item, the system may not get ratings for the item. Another practical challenge is learning the appropriate latent factors when the rating matrix is sparse, which is often the case in many real world scenarios.
In order to overcome these challenges, researchers have suggested incorporating additional sources of information about the users or items, also known as side information. This side information can be obtained from user profiles and item profiles, and may include any number of features regarding the users and items, such as, for example, demographics of a user, genre of a movie, etc. The user demographics could be used to infer the relationships between the users, and similarly, the item similarity can be used to automatically assign ratings to new items. The use of side information to aid matrix factorization has been successfully applied by various prior works. These methods, however, only utilize the side information as regularizations in the model, and the learned latent factors may not be very effective due to the sparse nature of the ratings and the side information. In order to make matrix factorization based methods effective in such a setting, it is highly desirable to learn and extract discriminative features from the datasets.