The relationship between Recall and Precision
Michael Buckland, Fredric Gey. 1994. (View Paper → )
Empirical studies of retrieval performance have shown a tendency for Precision to decline as Recall increases. This article examines the nature of the relationship between Precision and Recall. The relationships between Recall and the number of documents retrieved, between Precision and the number of documents retrieved, and between Precision and Recall are described in the context of different assumptions about retrieval performance. It is demonstrated that a tradeoff between Recall and Precision is unavoidable whenever retrieval performance is consistently better than retrieval at random. More generally, for the Precision–Recall trade-off to be avoided as the total number of documents retrieved increases, retrieval performance must be equal to or better than overall retrieval performance up to that point. Examination of the mathematical relationship between Precision and Recall shows that a quadratic Recall curve can resemble empirical Recall–Precision behavior if transformed into a tangent parabola. With very large databases and/or systems with limited retrieval capabilities there can be advantages to retrieval in two stages: Initial retrieval emphasizing high Recall, followed by more detailed searching of the initially retrieved set, can be used to improve both Recall and Precision simultaneously. Even so, a tradeoff between Precision and Recall remains.
The precision-recall tradeoff is an important truth for both search and machine learning. Product managers need to understand the context in which their products or models are being used and strike the right balance.