Ranking Basics: Pointwise, Pairwise, Listwise

December 20, 2024

Because thy neighbour matters

Image taken from unsplash.com

First, let’s talk about where ranking comes into play. Ranking is a big deal in e-commerce and search applications — essentially, any scenario where you need to organize documents based on a query. It’s a little different from classic classification or regression problems. For instance, in the Titanic dataset, you predict whether a passenger survives or not, and in house price prediction, you estimate the price of a house. But with ranking, the game changes. Instead of predicting a single value or category, you’re trying to order documents based on relevance.

Take an example: You search for “saree” on an e-commerce website like Amazon. You don’t just want a random list of sarees; you want the most relevant ones to appear at the top, right? That’s where Learning to Rank (LTR) steps in — it ranks documents (or products) based on how well they match your query.

Now that we know where ranking fits in, let’s dive into the nitty-gritty of different approaches and methods.

There are three main methods for Learning to Rank (LTR):

PointwisePairwiseListwise

To make things easier to follow, let’s establish some notation that we’ll use to explain these methods.

We’ll work with a set of queries q1,q2,…,qn and each query has a corresponding set of documents d1,d2,d3,…,dm. For example:

Query q1 is associated with documents d1,d2,d3Query q2 associated with documents d4,d5.

With this setup in mind, let’s break down each method and how they approach the ranking problem.

Pointwise

In the pointwise approach, we treat the ranking problem as a simple classification task. For each query-document pair, we assign a target label that indicates the relevance of the document to the query. For example:

Label 1 if the document is relevant.Label 0 if the document is not relevant.

Using our earlier example, the data would look like this:

q1,d1→label: 1q1,d2→label: 0q1,d3→label: 1q2,d4→label: 0q2,d5→label: 1

We train the model using this labeled data, leveraging features from both the queries and the documents to predict the label. After training, the model predicts the relevance of each document to a given query as a probability (ranging from 0 to 1). This probability can be interpreted as the relevance score.

For example, after training, the model might produce the following scores:

q1,d1→score: 0.6q1,d2→score: 0.1q1,d3→score: 0.4

Using these scores, we re-rank the documents in descending order of relevance: d1,d3,d2. This new ranking order is then presented to the user, ensuring the most relevant documents appear at the top.

Pairwise

The main drawback of the pointwise approach is that it misses the context in which the user interacts with a document. When a user clicks on or finds a document relevant, there are often multiple factors at play — one of the most important being the neighboring items.

For instance, if a user clicks on a document, it might not necessarily mean that the document is highly relevant. It could simply be that the other documents presented were of poor quality. Similarly, if you had shown a different set of documents for the same query, the user’s interaction might have been entirely different.

Imagine presenting d4 for query q1. If d4 is more relevant than d1, the user might have clicked on d4 instead. This context — how documents compare to each other is completely overlooked in the pointwise approach.

To capture this relative relevance, we turn to the pairwise approach.

In the pairwise method, instead of looking at query-document pairs in isolation, we focus on pairs of documents for the same query and try to predict which one is more relevant. This helps incorporate the context of comparison between documents.

We’ll generate the data similarly for now, but the way we use it will be slightly more complex. Let’s break that down next.

Imagine the training data for the pairwise approach structured as follows:

q1,(d1,d2)→label: 1(indicating d1 is more relevant than d2)q1,(d2,d3)→label: 0 (indicating d2 is less relevant than d3)q1,(d1,d3)→label: 1 (indicating d1 is more relevant than d3)q2,(d4,d5)→label: 0(indicating d4 is less relevant than d5)

Here, we assign the labels based on user interactions. For instance, d1 and d3 both being clicked indicates they are relevant, so we maintain their order for simplicity in this explanation.

Model Training Process:

Although the training data is in pairs, the model doesn’t directly process these pairs. Instead, we treat it similarly to a classification problem, where each query-document pair is passed to the model separately.

For example:

s1 = f(q1,d1)s2 = f(q1,d2)s3 = f(q1,d3)

The model generates scores s1,s2,s3 for the documents. These scores are used to compare the relevance of document pairs.

Penalizing the Model:

If the model predicts scores that violate the true order of relevance, it is penalized. For example:

If s1<s2, but the training data indicates d1>d2, the model is penalized because it failed to rank d1 higher than d2.If s2<s3, and the training data indicates d2<d3, the model did the right thing, so no penalty is applied.

This pairwise comparison helps the model learn the relative order of documents for a query, rather than just predicting a standalone relevance score like in the pointwise approach.

Challenges:

One of the main challenges of implementing pairwise models is the computational complexity — since we need to compare all possible pairs of documents, the process scales as O(n²). Additionally, pairwise methods don’t consider the global ranking of documents; they focus only on individual pairs during comparisons, which can lead to inconsistencies in the overall ranking.

Listwise

In listwise ranking, the goal is to optimize the entire list of documents based on their relevance to a query. Instead of treating individual documents separately, the focus is on the order in which they appear in the list.

Here’s a breakdown of how this works in ListNet and LambdaRank:

NDCG (Normalized Discounted Cumulative Gain): I’ll dive deeper into NDCG in another blog, but for now, think of it as a way to measure how well the ordering of items matches their relevance. It rewards relevant items appearing at the top of the list and normalizes the score for easier comparison.

In listwise ranking, if you have a list of documents (d1, d2, d3), the model considers all possible permutations of these documents:

(d1, d2, d3)(d1, d3, d2)(d2, d1, d3)(d2, d3, d1)(d3, d1, d2)(d3, d2, d1)

Training Process:

Score Prediction: The model predicts a score for each document in the list, and the documents are ranked according to these scores.For example: s1 = f(q1,d1), s2 = f(q1,d2)Ideal Ranking: The ideal ranking is calculated by sorting the documents based on their true relevance. For example, d1 might be the most relevant, followed by d2, and then d3.NDCG Calculation: NDCG is calculated for each permutation of the document list. It checks how close the predicted ranking is to the ideal ranking, considering both relevance and the positions of the documents.Penalizing Incorrect Rankings: If the predicted ranking differs from the ideal, the NDCG score will drop. For example, if the ideal ranking is (d1, d3, d2) but the model ranks (d2, d1, d3), the NDCG score will be lower because the most relevant document (d1) isn’t ranked at the top.Gradient Calculation: The model calculates gradients based on how much the NDCG score would change if the order of documents was adjusted. These gradients guide the model on how to improve its predictions.

This process helps the model learn to optimize the entire ranking list, improving the relevance of documents presented to users.

Summary

When it comes to Learning to Rank, there’s no one-size-fits-all approach. Pointwise models are super easy to set up and update, but they don’t always take into account how documents relate to each other. That said, if you need something simple and fast, they’re a great option.

On the other hand, pairwise and listwise methods are more powerful because they look at how documents compare to one another. But with that power comes more complexity 😛, and listwise can be a real challenge because of its high complexity in training.

Personally, I find the pairwise approach to be the sweet spot. It strikes a good balance between complexity and performance, making it ideal for many situations.

At the end of the day, the method you choose really depends on your situation. How big and complicated is your dataset? Knowing the pros and cons of each method will help you pick the one that works best for what you’re trying to do.

That’s a wrap for today! Stay tuned for the next part, Until then happy ranking! 😊

References:

From RankNet to LambdaRank to LambdaMART: An Overview
Learning to Rank: From Pairwise Approach to Listwise Approach
Introduction to Learning to Rank

Ranking Basics: Pointwise, Pairwise, Listwise was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.