Recommender Systems
Collaborative Filtering
In a broad sense, it is the process of filtering for information or patterns using techniques involving collaboration among multiple users, agents, and data sources. Overall, CF techniques can be categorized into: memory-based CF, model-based CF, and their hybrid.
Matrix Factorization
Matrix factorization is a class of collaborative filtering models. Specifically, the model factorizes the user-item interaction matrix (e.g., rating matrix) into the product of two lower-rank matrices, capturing the low-rank structure of the user-item interactions.
Let $\mathbf{R} \in \mathbb{R}^{m \times n}$ denote the interaction matrix with m users and n items and the values of $\mathbf{R}$
represent explicit ratings. The user-item interaction will be factorized into a user latent matrix $\mathbf{P} \in \mathbb{R}^{m \times k}$ and an item latent matrix $\mathbf{Q} \in \mathbb{R}^{n \times k}$. For a given item
i, the elements of $\mathbf{q}_i$
measure the extent to which the item possesses those characteristics such as the genres and languages of a movie. For a given user u
, the elements of $\mathbf{p}_u$
measure the extent of interest the user has in items’ corresponding characteristics.
The predicted ratings can be estimated by
One major problem of this prediction rule is that users/items biases can not be modeled, so
Then, we train the matrix factorization model by minimizing the mean squared error between predicted rating scores and real rating scores.
where $\lambda$ denotes the regularization rate.
AutoRec
Although the matrix factorization model achieves decent performance on the rating prediction task, it is essentially a linear model. Thus, such models are not capable of capturing complex nonlinear and intricate relationships that may be predictive of users’ preferences.
AutoRec identifies collaborative filtering (CF) with an autoencoder architecture and aims to integrate nonlinear transformations into CF on the basis of explicit feedback. On the other hand, AutoRec differs from a traditional autoencoder: rather than learning the hidden representations, AutoRec focuses on learning/reconstructing the output layer. It uses a partially observed interaction matrix as input, aiming to reconstruct a completed rating matrix.
Let $\mathbf{R}_{*i}$
denote the $i^\textrm{th}$
column of the rating matrix, where unknown ratings are set to zeros by default.
The neural architecture is defined as:
The following objective function aims to minimize the reconstruction error:
where $| \cdot |_{\mathcal{O}}$
means only the contribution of observed ratings are considered, that is, only weights that are associated with observed inputs are updated during back-propagation.
Personalized Ranking
In general, personalized ranking models can be optimized with pointwise, pairwise or listwise approaches. Pointwise approaches considers a single interaction at a time and train a classifier or a regressor to predict individual preferences. Matrix factorization and AutoRec are optimized with pointwise objectives. Pairwise approaches consider a pair of items for each user and aim to approximate the optimal ordering for that pair. Listwise approaches approximate the ordering of the entire list of items, for example, direct optimizing the ranking measures such as Normalized Discounted Cumulative Gain (NDCG).
Bayesian Personalized Ranking Loss
Bayesian personalized ranking (BPR) is a pairwise personalized ranking loss that is derived from the maximum posterior estimator. The training data of BPR consists of both positive and negative pairs (missing values). It assumes that the user prefers the positive item over all other non-observed items. We can formulate the maximum posterior estimator to derive the generic optimization criterion for the personalized ranking task.
Where $\Theta$
represents the parameters of an arbitrary recommendation model,
$>u$ represents the desired personalized total ranking of all items for user.
$\hat{y}{ui}$ and $\hat{y}_{uj}$
are the predicted scores of the user u to item i and j,respectively.
Hinge Loss
where m
is the safety margin size.
NeuMF
This model leverages the flexibility and non-linearity of neural networks to replace dot products of matrix factorization, aiming at enhancing the model expressiveness. In specific, this model is structured with two subnetworks including generalized matrix factorization (GMF) and MLP and models the interactions from two pathways instead of simple dot products. The outputs of these two networks are concatenated for the final prediction scores calculation.
Sequence-Aware Recommender Systems
Caser, short for convolutional sequence embedding recommendation model, adopts convolutional neural networks capture the dynamic pattern influences of users’ recent activities. The main component of Caser consists of a horizontal convolutional network and a vertical convolutional network, aiming to uncover the union-level and point-level sequence patterns, respectively. The goal of Caser is to recommend item by considering user general tastes as well as short-term intention.
Factorization Machines
The strengths of factorization machines over the linear regression and matrix factorization are: (1) it can model
$\chi$-way variable interactions, where $\chi$
is the number of polynomial order and is usually set to two. (2) A fast optimization algorithm associated with factorization machines can reduce the polynomial computation time to linear complexity, making it extremely efficient especially for high dimensional sparse inputs.