In this post we are going to develop a java recommender application with implicit feedback for an Online Retail. In previous post we developed two java recommender application one for books and one for movies . What is new in this post is that we do not have the luxury of explicit feedback like ratings but rather implicit feedback like for example clicks, time stayed , view times and so on… This is a very common situation when usually the amount of implicit feedback outweighs the amount of explicit feedback(personally guilty on my Amazon purchases).
Nature of Implicit Feedback
As online retail business is growing by having large inventory items, unfolds the challenge of suggesting the user only what it will be most interesting and useful from selling perspective. According to this article Amazon experienced an increase with 29% from helpful suggestions so their impact in business maybe considerable. In same time another article suggests that although recommender systems can be very useful we need significant data and knowledge about our users in order to have good results.
Most of the times we are faced with very few Explicit Feedback(EF) like ratings or likes but fortunately on the other hand we have user activities like clicks,view, time spent , buying history. Implicit feedback in contrast to EF doesn’t offer direct preferences clarity. A rating scale from 1-5 can give a very clear feedback how a much a user prefers certain genre of movie or book. On the other hand the fact that a user bought or viewed a book does not necessary mean that the user liked it(maybe is a gift). Also the same is true when user hasn’t clicked or seen a movie doesn’t mean he doesn’t like it.
Although implicit feedback introduces some uncertainty most of the times provides useful insight. For example a user that bought a book from an author several times probably prefers that author books. Or if a user stayed long time reading an item review or description or even clicked and return several times than probably this item category is of interest and attracts him for the moment.
Regardless of the situation is obvious that an implicit feedback system offers a level of confidence on user preferences in contrast with EF which offers the preferences itself.
EF systems offer both positive and negative feedback on a specific scale like ratings 1-5. As consequence when implemented EF systems will take in consideration only data that user rated and ignore not known data. This help the algorithm to scale good and in same time perform well. While with implicit feedback we have no negative feedback but only positive feedback like user activity clicks, purchase history. Therefore we cannot simply ignore zero activity items as they may be of great interest to the user.This leads to algorithm processing large data and not scale good when the input increases. Fortunately this paper suggest an optimization which speeds up the processing time.
So in a few words we need a modification to our previous method using EF in order to address :
- Confidence in a Preference instead of Preference
- Scalability when zero cases are considered
Collaborative Filtering for Implicit Feedback
This section is a short humble explanation of the paper Collaborative Filtering for Implicit Feedback Datasets by Hu, Koren, and Volinsky which gives the model and solution for this problem. It is worth to mention that also Apache Spark MLib uses this paper as reference for the implementation of Collaborative Filtering algorithm.
Recalling from previous post the cost function looked as below(here we are following paper semantics):
- X denotes the user and Xu is representing preferences weights for user u.For example if a user likes action and horror movies we have big weight for this genres otherwise if he doesn’t like romance we have low or zero weight for this genre.
- Similarly Y denotes items and Yi is representing features weight for item i.For example a movie can be 40% action, 20% horror, 10% romance, 30% Sc-Fi.
- r denotes to actual given ratings and rui is rating from user u on item i. Intuitively we can see that multiplying XuTyi gives rating prediction for user u on item i. For example If we know that a user likes action and horror movies and a movie is 0.6 action and 0.3 horror than this user will probably prefer it.
- Ignoring the second term for a moment we can see that what the function is doing is just calculating the difference of actual rates with prediction rates(squared just to have absolute values).So basically is just calculating how well our predictions are doing in comparing to actual ratings.Ideally we want the difference to be zero or as small as possible.
- Since this is the cost function for EF it is applied only to known rates and ignoring what items are not rated by any user.
- The other term on the left is it is used for preventing over fitting of the data. Although the algorithm works without it, it is important in real world because acts as regularizing factor. We control the impact by providing different values of λ.
- In the end we use this cost function in Logistic Regression algorithm to minimized in such way that difference between rates and prediction to be zero or as small as possible. Please find here an more detailed explanation from previous post.
In this section we are going to modify above equation to adapt to IF problem. First we start by defining the preference pui of user u on item i:
- For IF rui represent user’s u interaction on item i in contract to EF which was rather the rating/direct preference on a specific scale.
- As we can see pui is simply zero when we user u did not interact with item i and 1 if user u interacted at least once with item i.
- As we mention above IF is characterized by certain uncertainty as it offers a confidence level on user preference in an item in contrast with EF offering user preference in an item. Anyway if the interaction of a user on an item increases also our confidence that the user may liked should also increase.So this brings us to below simple formula: cui is now the confidence and as we can see it increases linearly with the increase of user u interaction on item i rui . α is a rate that controls the increase of confidence when interaction is increased, paper suggest 40 as a good value.
- Now we just need to do a small modification to above formula and put everything together as below:If we compare with first equation the only difference are:
- Instead of rui know we have pui which is a binary variable, outputting 1 if there is an interaction and zero if no interaction.
- First term is now multiplying by confidence of user u on item i cui. As we know the first term is just measuring how well the prediction is doing comparing to actual values(ideally difference zero so we predicted what is was the real value).Lets suppose our prediction is not doing well and the difference is big like 10.Now by multiplying with confidence(which is high for big interaction ans small otherwise) we just dramatically increase the mistake if we have confidence on preference and slightly increase if we do not have that much confidence. By exaggerating the mistake or penalizing the cost function we give the signal that this prediction is important so Logistic Regression hast to work more on minimizing this value.
In a few words the modification is about giving more weight to preference that we have high confidence by increasing the cost of a prediction mistake. So Logistic Regression will have to minimize especially those preferences if it want to ever converge which it will no matter what as it is a convex function.
The online retail data used for the building the algorithm and application can be found here. There are 541.910 rows containing customer and product related data.
What will be used as user u interaction on item i rui is users purchase history on items. So if a user has a history buying a lot of some items type we will tend to think he likes similar items and may want to buy again. On the other hand if user buys something it doesn’t mean he necessarily likes it, after all it may be a gift or user was not satisfied with product. This uncertainty that buying history offers makes it good choice for our interaction source. The data will look like below:
Data need to be pre-processed and prepared because there are some mistakes here and there and in same time Spark MLib accepts only Integers has productId and userId.
Similar to what is described on previous post we are dividing the data into two groups(previous was divided on three : training data, cross validation, test data) training data and test data.
Training Data are randomly chosen as 80% from all data set and test data the other randomly 20%. As always training data are used for training the algorithm while test data are used to see how the algorithm performs with non seen data.
As method to evaluate we compare the prediction and the real values using the RMSE method described here.
What RMSE is doing is basically calculating the squared difference of the prediction and real value for all data. The squared is used in order to give more weight(exaggerate) to the differences between what algorithm predicted and what is the wanted value.
- feature size with default 150. Please note that training is slowed down noticable with the increase but also the RMSE is improved
- different ratio between training and test data with default 80% and 20% respectively.
- regularize param λ with default 0.01. Notice that increasing the value may increase RMSE but that does not necessarily mean our algorithm is behaving bad. Sometimes we it happens our algorithm performs perfect on training and test but than when faced with production data it performs poor. So increasing this value make the algorithm more tolerant and avoids above situation(over-fit of data).
Application already loads a default training executed before hand with RMSE 42 and 350 features, 80% training and 20% test and reg param 0.01(that is why it may take some seconds for app to load).