Practical Recommender Systems · Kim Falk · 2019

Recommendations have become an important part of most products. As such, Product Managers need a good understanding of the underlying technology. Although the theory can be complex, having an understanding of the tradeoffs and limitations of your system is crucial.

Key Insights

Recommendations work best with large amounts of data, including historical behaviour data, content metadata, contextual data and demographic information. When describing a recommender system, consider the domain, purpose, context, personalisation, opinions, privacy, trustworthiness, interfaces, and algorithms.

There is a tradeoff between accuracy and explainability. Data quality is crucial, and feedback can be explicit (ratings, likes) or implicit (activity). Implicit ratings based on usage data are often more trustworthy.

Browser behaviour events can demonstrate interest, and storing as much data as possible is beneficial.

Collaborative filtering is suitable when user engagement is high, while content-based filtering works better for low user engagement or one-time visit products. Identifying users through sign-up or login can help recommenders. The conversion path is the sequence of pages and actions a user takes from arrival to conversion.

Implicit ratings can be computed by converting web behaviour into content ratings, which are often expressed in a user-item matrix. Empty cells (sparsity) are common, and recommender systems may need to predict values for them. Time-decay can be introduced to give recent signals higher importance.

The TF-IDF (Term Frequency-Inverse Document Frequency) problem refers to the idea that items or terms that appear less frequently across a dataset can be more informative or valuable for making recommendations or analysing content.

Start with non-personalised recommendations, then graduate to personalised ones. Content should be ordered based on perceived user interest, using recency for dynamism.

Seeded recommendations use an item as input to find related items. Affinity analysis determines links between products, focusing on items bought together but not with everything else.

The cold start problem arises when there is insufficient knowledge of users or items for personalisation. Active learning can help select informative items for users to rate. Cold items are best handled using content-based filtering. User segments can be created using cluster analysis to find hidden traits and similarities. Metadata can be used to find similar items, but the level of abstraction must be carefully chosen. Personalised recommendations often involve similarity calculations, such as the Jaccard distance.

User similarity is determined by the number of items they have in common (the overlap between the items that two users have interacted with) and the presence of unseen items (the items that one user has interacted with but the other user has not yet seen or interacted with). This approach helps address the "grey sheep" problem, where users with unique or hard-to-match tastes can still receive relevant recommendations based on the items enjoyed by similar users that they haven't interacted with yet.

In ML pipelines, as much as possible is done before the user visits. The choice between item-based and user-based filtering depends on the relative numbers of users and items. Data connectivity is crucial for recommendations. Collaborative filtering requires no domain knowledge but has tradeoffs between narrowing users and risking irrelevant recommendations. It struggles with new users or items, but workarounds exist. Pros include behaviour-based recommendations, while cons include sparsity, difficulty with unique tastes, and skewing towards popular content.

The best recommender systems account for diversity, context, evidence, freshness, and novelty. They require maintenance and monitoring. Implementation should start simple and gradually add complexity. Leading indicators can be used if key success metrics are lagging. The Matthew effect warns against always favouring popular items. Coverage refers to the diversity of recommendations, while serendipity provides users with novel experiences. Error metrics like Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) measure the difference between historical ratings and algorithm predictions. Decision-support metrics like precision and recall evaluate the relevance of recommendations. Precision at k and normalised discounted cumulative gain (NDCG) are common evaluation measures.

Latent factors, or hidden genres, can explain users' tastes with less granularity than individual items but more detail than genres. Dimension reduction through matrix factorisation, like UV-decomposition, enables the discovery of latent factors. Imputation techniques, such as mean filling or normalisation, can address the sparsity of rating matrices. Baseline predictors extract item and user biases to improve predictions.

Hybrid recommenders combine components of various recommenders in three ways: monolithic (glued together), ensemble (combining results), and mixed (returning all results). Mixed hybrids return the union of results from a hierarchy of recommenders, with the most personalised recommender providing a few recommendations and less personalised ones contributing more. Feature-weighted linear stacking (FWLS) uses meta-features to make weights into functions, allowing flexibility based on user characteristics

Learning to rank (LTR) combines different data sources, such as popularity, distance, or recommender system outputs, to rank items. Features that provide an ordering of objects are sought. Foursquare's algorithm ranks nearby venues by inverting distances, setting a maximum distance, and rescaling data between 0 and 1. The system learns weights that produce an expected ordering of items on the page. Optimisation is achieved through the use of check-in features when the desired output is unknown.

Full Book Summary · Amazon

Subscribe Button

Quick Links

1998 PageRank paper from Larry and Sergey · Paper

10 forces shaping commerce · Article

Statistics 110 Harvard · Videos

10 Simple rules for making good presentations · Image

Collaborative Filtering for Implicit Feedback Datasets · Paper

How to build an AI startup · Article

Ben Franklin on the 13 necessary virtues · Article

Buttons on the web. Placement and order · Article

Rethinking product development for the age of intelligence · Article

37 Signals guide for internal communication · Article

A Comprehensive Overview of Software Product Management Challenges · Author · Olga Springer & Jakub Miler · 2022

Product Management is a new discipline - and we’re still working things out. It’s taking a while for the theory to catch up with the reality. You may find the gap is wider if you’re in a traditional company far from Silicon Valley or in industry where there are barriers to innovation. To drive the Product Management Community forward it’s important we identify and close that gap - that’s why this paper is so interesting.

I encourage you to read the paper - but here are some of the results…

View the Paper

Book Highlights

A good platform provides standards, templates, APIs, and well-proven best practices for Dev teams to use to innovate rapidly and effectively. A good platform should make it easy for Dev teams to do the right things in the right way for the organization... Matthew Skelton, Manuel Pais, and Ruth Malan · Team Topologies

When you select a particular insight, you are taking the risk associated with the assumptions being wrong….The better the information you collected, and the further you have progressed your previous discovery, the greater your confidence will be about the hypothesis’s validity Nacho Bassino · Product Direction

During discovery you’ll really dig deep into: - Who the customers and users are you believe will use your solution - How they meet their needs today without your solution - How the world would change for them with your solution - How your solution might look and behave - How long your solution might take to build Jeff Patton · User Story Mapping

Quotes and Tweets

Often the problem is not what happened but how it was communicated. James Clear

If you want a new idea, read an old book. Ivan Pavlov

I want AI to do my laundry and dishes so that I can do art and writing, not for AI to do my art and writing so I can do my laundry and dishes. Joanna Maciejewska

Product #58