Mon Mar 20 2023

A Complete Guide to Recommendation Systems by Xomnia

By Tulio Carreira

Recommendation systems influence the way we shop online, stream movies, listen to music, and many other activities. They analyze data about users, as well as any items/ services in question, to make predictions about what a user is most likely to be interested in. Consequently, these systems suggest products, services, items, etc., based on users' preferences and previous actions, in combination with the items’ characteristics.

Netflix uses a recommendation engine to suggest movies you might like to watch, and Amazon uses one to offer you products you’re likely to buy. By providing customers with recommendations that match their needs or desires, recommendation engines can increase conversion rates and unlock extra revenue at checkout, while giving customers a more personalized experience. That’s why well-designed recommendation engines are highly desirable for online commerce businesses.

In this blog post, we'll delve deeper into what recommendation systems are, how they work, and some of the challenges and benefits associated with using them.

How can businesses benefit from using recommendation systems?

Recommendation systems save time and resources and help businesses in scaling operations more efficiently by automating the process by which items or experiences are recommended to users.

By recommending products that customers are more likely to purchase, recommendation systems help businesses in increasing their conversion rate KPIs and unlocking extra revenue. Recommendation engines can also improve customer retention if customers are given suggestions that align well with their personal preferences, increasing their brand loyalty in the process.

Recommendation systems can provide businesses with rich insights into customer behavior and preferences, as they generate rankings of item preferences for different user profiles. By analyzing such data, businesses can better understand user preferences and consumption patterns to make more informed decisions.

On the flip side of the coin, the most significant benefit of recommendation systems for customers is enabling them to consume or purchase more personalized content or items that are tailored to their own likings. This, in turn, leads to a more satisfying and engaging experience. Additionally, recommendation systems can help users in finding new content that they might like but find it too difficult to find otherwise. In other words, when developed carefully, recommendation systems can increase exposure to different perspectives.

What are the different types of recommendation systems?

The most common approaches to recommendation systems are collaborative filtering (user-to-user) and content-based (item-to-item). Both give recommendations to users based on data, but they differ in the way they process data.

Collaborative filtering and content-based filtering comparison (Source)

1) Collaborative filtering

This is the technique used by most recommendation systems, and it does not require deep analysis of items as it relies on the opinion of similar users. It groups users based on their item ratings, activities, and feedback, giving recommendations based on the group(s) that each user falls within. For example, if user A watched ten different movies on their streaming service account and user B  watched six movies out of the ten that user A has watched, we could say that users A and B have similar tastes. Therefore, the collaborative filtering recommendation system will recommend to user B the remaining movies that were watched by user A.

Some collaborative filtering recommendation systems employ the matrix factorization technique, which is particularly useful for recommending items to new users (whose preferences are unknown  at a given platform) as well as for recommending items that are dissimilar to items the user has liked in the past.

2) Content-based filtering

This approach relies solely on the features of the items when suggesting them to users based on what items these users have previously liked. For example, if a user has watched a lot of comedies, the content-based movie recommender system is likely to suggest another comedy movie. Similarly, if a user has read several romance novels in the past, the system may recommend other romance novels based on similarities in the books’ characteristics.

3) The Hybrid filtering approach

Some recommendation systems implement a hybrid approach that combines both collaborative and content-based filtering. A hybrid recommendation system could, for instance, use content-based filtering to recommend items to new users and collaborative filtering to recommend items to users that have more established preferences on the platform.

4) Other approaches

There are a few other approaches that recommendation systems employ that are worth being briefly mentioned:

  • Knowledge-based systems are more specific recommendation systems that are based on explicit knowledge about users’ preferences and recommendation criteria. These systems are particularly useful when the collaborative-filtering and content-based filtering approaches are not applicable, when the attributes of the items are too complex, or when the user's preferences are well-defined in advance.
  • Demographic recommendation systems, on the other hand, give recommendations based on age, gender, and location, which can help in giving more personalized recommendations. The downside of this system is that it requires access to personal data that not all users are willing to share.

How to implement recommendation systems?

Recommendation systems are usually based on cosine similarity calculations, which measure the similarity between the numerical vectors that represent the items being compared. For a more detailed and mathematical explanation of cosine similarity, please refer to this page.

The code below shows how we would implement a content-based music recommendation engine that suggests songs based on an input song. It assumes that the chosen features (see examples here) are already engineered and ready to be used for analysis. For the sake of demonstration, this function calculates the cosine similarity between the input song and each of the songs in the dataset by using the cosine calculation functionality provided by the sci-kit learn library. The songs are then ranked by descending similarities, and the most similar ones are returned by the function.

import numpy as np
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity

# Load data
songs_dataset = pd.read_csv('songs.csv')

def get_top_100_songs(song_title: str, songs_df: pd.DataFrame) -> pd.DataFrame:
  """
  This function calculates the cosine similarities between songs in order to identify the ones that are the most similar to an input song.
  Args:
      song_title (str): The song title to be paired against other songs.
      songs_df (pd.DataFrame): The dataset containing all songs available.
  Returns:
      pd.DataFrame: The output data frame containing the top 100 song recommendations to someone that likes the input song.
  """

  # Retrieve the song id and song features based on the song title
  song_id = songs_df[songs_df['title'] == song_title].index[0]
  song = np.array(songs_df.iloc[song_id]).reshape(1, -1)

  # Calculate the cosine similarities between input song vs songs in the dataset
  cosine_sim = cosine_similarity(np.array(songs_df), song)
  songs_df['similarity'] = cosine_sim

  # order songs by similarity and return top 100 most similar songs
  df_recommendations = songs_df.sort_values(by='similarity', ascending=False)
  return df_recommendations.head(100)

# Example usage: What are the 100 most similar songs to ‘Happy Birthday’?
get_top_100_songs('Happy Birthday Song', songs_dataset)

Pro tip: In a real-world application, the cosine similarity calculation could be implemented outside of the function in order to obtain similarities between all pairs of songs at once, improving code efficiency.

In the collaborative filtering approach, however, the goal is to recommend items to a user based on what similar users like. Our solution would be slightly similar to the content-based approach, with some tweaks:

  1. We start with a user-item matrix consisting of the ratings that each user gave to each item. It is the case that most cells in this matrix will be null because not every user has given an opinion on every item; therefore, we can fill these cells with 0.
  2. Then, we calculate the cosine similarity for this user-item matrix and sort it by similarity to the input user (e.g., Bob), retaining the users that are most similar to him.
  3. Then, we want to predict the rating that Bob would give to the items he hasn’t rated yet based on the opinions of similar users. So, for each ungraded item, e.g., Titanic, we go over the list of similar users and calculate the average rating they gave it. If Alice, Fred, and Ross are similar users to Bob, and they rated this movie with 8, 7, and 9, respectively, that means we would predict that Bob would give Titanic a grade of 8.

Using average ratings calculated for all the movies that Bob has not yet watched, the recommendation system then sorts them by rating in descending order and returns the movies with the highest predicted grade.

How do you evaluate and optimize recommendation systems?

There is a handful of metrics that can be used to evaluate recommendation systems. With the appropriate metrics, it is possible to improve the performance of the recommendation systems in order to make their suggestions more relevant to the user.

Precision, recall, and F1 score are well-known and widely used metrics not only in the development of recommendation systems but also for other machine learning models. MAP, NDCG, and coverage, on the other hand, are more specific to assessing ranking quality and recommendation quality (Please note that the metrics below assume that you have a ground truth dataset with items labeled as relevant recommendations or not):

1) Precision

Measures the proportion of recommended items that are actually relevant to the user (number of relevant recommendations divided by the total number of recommendations).

2) Recall:

Measures the proportion of relevant items that were recommended to the user (number of relevant recommendations divided by the total number of relevant items).

3) F1 score:

This metric is a weighted average that combines precision and recall. It is given by the harmonic mean of precision and recall, as seen: F1 score = (precision * recall) / (precision + recall)

4) Mean Average Precision (MAP):

This metric takes into account the relevance of recommendations as well as their ranking. It considers whether all relevant items are ranked highly.

5) Normalized Discounted Cumulative Gain (NDCG):

Measures the ranking quality of the recommended items. It increases when relevant items are placed higher in a recommendation list.

6) Coverage:

Measures the proportion of items that are recommended to at least one user. This metric reflects the diversity of recommendations and is a good indicator of how well a recommendation system is dealing with the popularity bias and filter bubble.

7) Click-Through Rate and Conversion Rate:

These metrics are useful in e-commerce and advertising use cases. Respectively, they measure the proportion of recommended items that are clicked on by a user and how many recommendations lead to a purchase or other desired action.

If a ground truth dataset is not available, it is always a good idea to perform A/B tests by serving different versions of the recommendation system to different subsets of users. In other words, users can be randomly assigned to groups and each group is presented with different recommendations. The system version with the best performance (i.e. the system that made the best recommendations to its users) is then selected.

How do businesses use recommendation systems?

Recommendation systems can be applied in several domains. The list below describes some of the common use cases in different industries:

1) E-commerce

Purchase history, browsing behavior, and user preferences can all be useful data for improving product recommendations to customers. For instance, data engineers and machine learning engineers from Xomnia worked together on the development of a recommendation system to suggest the best products for a given customer with the goal of boosting VodafoneZiggo’s B2B online experience. The recommender engine provides those suggestions based on the services used by similar existing customers.

2) Online advertising

This includes recommending personalized ads to users based on their profile, demographics, interests, and browsing history. For example, one of the solutions that Xomnia provided to improve IceMobile's app experience involved a recommendation system that created a personalized campaign for each individual user. The input data included the purchase transaction history, shopping behavior, frequency of visits, the number of products purchased per visit, and the characteristics of each customer.

3) Entertainment

Most streaming services - such as Spotify and Netflix - recommend songs and movies/TV shows to users based on their ratings, preferences, and listening/watching history and habits.

4) Social media

Recommending posts, pages, and people to follow based on users’ activity and interests is now a common practice in applications such as Facebook, Instagram, and TikTok.

5) Services, education, and leisure

Recommendation systems can also be used in tourism applications to suggest hotels, flights, and activities. In education, they can suggest courses and learning resources to students. In news and content, they can suggest appropriate articles and blogs, among other applications. In healthcare systems, they can, for example, motivate users to be more healthy by suggesting habits and actionable knowledge based on observed user behavior.

Can recommendation systems be integrated into existing applications or websites?

Definitely! One example is the “suggested products based on your preferences” element present in most e-commerce websites nowadays. Some of these are plugins or extensions that query APIs that communicate directly with a recommendation engine. The API reads the profile of the current user browsing the website, the items that they prefer, or both, and sends that data to the recommendation engine, which will be responsible for sending back a list of items that the user will potentially enjoy. These plugins can help allow users to discover items or content related to their preferences.

What are the potential challenges and limitations of recommendation systems?

1) Cold start problem

Even though collaborative filtering does not require deep analysis of item features, since it relies on personal preferences from similar users, this system has a few limitations, such as the cold start problem.

This problem happens when new users who haven't provided any rating or feedback cannot be associated with other users, making it hard for the system to make new recommendations. For the same reason, recommendation systems can’t recommend new items since those items do not have ratings yet.

How to mitigate it: Some systems ask users to provide explicit feedback or preferences to build a profile from scratch. That way, collaborative filtering systems can already suggest items based on similar users’ preferences. Moreover, the content-based approach can mitigate the cold start problem on an item level: Items that are new to the platform and have no ratings can already be suggested to users because this approach relies on items’ characteristics instead of user ratings.

2) Popularity bias:

This is another issue that collaborative filtering algorithms face. Items that are not popular, and therefore have no ratings associated with them, will never be suggested to the users of the platform. This puts less popular products and services at a disadvantage.

How to mitigate it: Once again, the content-based approach mitigates this issue because it does not depend on user ratings. Items that are frequently overlooked might also be included in suggestion lists given that the content-based systems rely on item features instead of other users' likes and dislikes.

3) Filter bubbles:

The content-based approach may not always correctly perceive user preferences given the nuances and subjectivity of each individual user. For example, users A and B have similar preferences overall, but the same comedy movie might be liked by user A while disliked by user B. Additionally, this kind of system may keep on giving biased recommendations based on what the user already likes, therefore lacking novelty and not always effectively engaging the user over time.

How to mitigate them: Developers can ask users for feedback on the recommendations given in order to identify and correct biases, as well as increase the diversity of the recommendations by taking the risk of recommending items that are potentially out of the user’s comfort zone. Besides, collaborating with sociology, psychology, and ethics experts can ensure that the recommendations are given with a better understanding of their potential impact.

4) Ethical concerns:

Finally, there are ethical concerns related to recommendation systems, particularly regarding data privacy and potential manipulation.

How to mitigate them: If a system is recommending items based on user data, it is important that the user has control over their data, and that the system is transparent in how it makes recommendations.

Takeaway: Even though collaborative filtering systems are more common than content-based systems, both approaches have advantages and limitations. That is why sometimes recommender engines combine strategies from both approaches to benefit from their advantages while mitigating their issues.

What are some ethical issues associated with recommendation systems?

  1. Filter bubbles, as previously mentioned, happen when users are exposed to information that only enforces their existing beliefs, even though those can sometimes be wrong, which increases the risk of disseminating fake news and limits exposure to diverse viewpoints and opinions. It is important that recommendation systems offer items out of the users’ comfort zone from time to time to diminish the impact of this issue.
  2. Manipulative or exploitative outcomes: If the developers behind a recommendation system have the sole goal of prioritizing the interests of the platform or the advertiser instead of also taking into account the users’ preferences, this can lead to manipulative or exploitative outcomes. Algorithmic bias, privacy concerns, and lack of transparency are issues linked to  recommendation systems. For more information on those issues, please refer to Xomnia’s blog post on responsible and ethical AI.

Conclusion

Recommendation systems are a very nice tool to help consumers in deciding which products to consume, books to read, playlists to listen to, etc. They provide personalized recommendations for users based on their previous actions, preferences, and other characteristics.

Modeling user preferences, however, is a complex task and there is probably no "holy grail" that works best in all cases. Therefore, it is key to understand that recommender engines don’t just come in the item-to-item or user-to-user form. By using your creativity, you can enrich your recommendation system with more features and come up with other methods by combining approaches into a new hybrid recommender system that might work very well for your specific case.

Last but not least, even though there are challenges and ethical concerns associated with recommender systems, if those are carefully mitigated, recommendation systems provide significant benefits and lead to increased engagement, loyalty, and exposure to new and relevant content.