Recommender Systems

Recommender Systems

Collaborative Filtering

In a broad sense, it is the process of filtering for information or patterns using techniques involving collaboration among multiple users, agents, and data sources. Overall, CF techniques can be categorized into: memory-based CF, model-based CF, and their hybrid.

Matrix Factorization

Matrix factorization is a class of collaborative filtering models. Specifically, the model factorizes the user-item interaction matrix (e.g., rating matrix) into the product of two lower-rank matrices, capturing the low-rank structure of the user-item interactions.

Let $\mathbf{R} \in \mathbb{R}^{m \times n}$ denote the interaction matrix with m users and n items and the values of $\mathbf{R}$
represent explicit ratings. The user-item interaction will be factorized into a user latent matrix $\mathbf{P} \in \mathbb{R}^{m \times k}$ and an item latent matrix $\mathbf{Q} \in \mathbb{R}^{n \times k}$. For a given item
i, the elements of $\mathbf{q}_i$
measure the extent to which the item possesses those characteristics such as the genres and languages of a movie. For a given user u
, the elements of $\mathbf{p}_u$
measure the extent of interest the user has in items’ corresponding characteristics.
The predicted ratings can be estimated by

One major problem of this prediction rule is that users/items biases can not be modeled, so

\underset{\mathbf{P}, \mathbf{Q}, b}{\mathrm{argmin}} \sum{(u, i) \in \mathcal{K}} | \mathbf{R}{ui} -
\hat{\mathbf{R}}_{ui} |^2 + \lambda (| \mathbf{P} |^2_F + | \mathbf{Q}
|^2_F + b_u^2 + b_i^2 )

h(\mathbf{R}{*i}) = f(\mathbf{W} \cdot g(\mathbf{V} \mathbf{R}{*i} + \mu) + b)

\underset{\mathbf{W},\mathbf{V},\mu, b}{\mathrm{argmin}} \sum{i=1}^M{\parallel \mathbf{R}{i} - h(\mathbf{R}_{i})\parallel_{\mathcal{O}}^2} +\lambda(| \mathbf{W} |_F^2 + | \mathbf{V}|_F^2)

\begin{split}\begin{aligned}
\textrm{BPR-OPT} : &= \ln p(\Theta \mid >u) \
& \propto \ln p(>_u \mid \Theta) p(\Theta) \
&= \ln \prod
{(u, i, j \in D)} \sigma(\hat{y}{ui} - \hat{y}{uj}) p(\Theta) \
&= \sum{(u, i, j \in D)} \ln \sigma(\hat{y}{ui} - \hat{y}{uj}) + \ln p(\Theta) \
&= \sum
{(u, i, j \in D)} \ln \sigma(\hat{y}{ui} - \hat{y}{uj}) - \lambda_\Theta |\Theta |^2
\end{aligned}\end{split}

\sum{(u, i, j \in D)} \max( m - \hat{y}{ui} + \hat{y}_{uj}, 0)

$$
where m
is the safety margin size.

NeuMF

This model leverages the flexibility and non-linearity of neural networks to replace dot products of matrix factorization, aiming at enhancing the model expressiveness. In specific, this model is structured with two subnetworks including generalized matrix factorization (GMF) and MLP and models the interactions from two pathways instead of simple dot products. The outputs of these two networks are concatenated for the final prediction scores calculation.

Sequence-Aware Recommender Systems

Caser, short for convolutional sequence embedding recommendation model, adopts convolutional neural networks capture the dynamic pattern influences of users’ recent activities. The main component of Caser consists of a horizontal convolutional network and a vertical convolutional network, aiming to uncover the union-level and point-level sequence patterns, respectively. The goal of Caser is to recommend item by considering user general tastes as well as short-term intention.

Factorization Machines

The strengths of factorization machines over the linear regression and matrix factorization are: (1) it can model
$\chi$-way variable interactions, where $\chi$
is the number of polynomial order and is usually set to two. (2) A fast optimization algorithm associated with factorization machines can reduce the polynomial computation time to linear complexity, making it extremely efficient especially for high dimensional sparse inputs.

references

https://d2l.ai/chapter_recommender-systems/index.html