10 Papers Every Product Manager Should Read

Most of us associate papers with science and academia. In my experience, the Product, Design, and Delivery communities overlook papers as a source of insight and inspiration.

By delving into papers, we can gain a deeper understanding of the surrounding disciplines.

Here are 10 papers worth reading that I hope will inspire you to seek out more on your quest to make better products.

The Evaluation of Business Strategy Richard Rumelt (1980)

It is impossible to demonstrate conclusively that a particular business strategy is optimal or even to guarantee that it will work. One can, nevertheless, test it for critical flaws….

Rumelt suggests a strategy should fit with at least one of these criteria:

Consistency: The strategy must not present mutually inconsistent goals and policies.
Consonance: The strategy must represent an adaptive response to the external environment and to the critical changes occurring within it.
Advantage: The strategy must provide for the creation and/or maintenance of a competitive advantage in the selected area of activity.
Feasibility: The strategy must neither overtax available resources nor create unsolvable subproblems.

If not, treat it with skepticism.

Nudging a Very Short Guide R. Sunstein (2014)

This brief essay offers a general introduction to the idea of nudging, along with a list of ten of the most important “nudges”...

I refer back to this paper once a month. The 10 nudges are a great checklist. Often, you’re not doing everything you can to influence user behaviour. You can’t help but make a better product if you keep them in mind.

A Comprehensive Overview of Software Product Management Challenges Olga Springer & Jakub Miler (2022)

Product Management is a new discipline - and we’re still working things out. It’s taking a while for the theory to catch up with the reality. You may find the gap is wider if you’re in a traditional company far from Silicon Valley or in industry where there are barriers to innovation. To drive the Product Management Community forward it’s important we identify and close that gap - that’s why this paper is so interesting.

Taking into account frequency and severity, these are the top problems faced by Product Managers:

Determining the true value of the product that the customer needs
Strategy and priorities are changing frequently
Technical debt
Working in silos (problem with communication, synchronisation between teams)
Balancing between reactive and proactive work. When comparing hypotheses with facts, hypotheses lose in value to facts (such as clients’ requests, bugs). Managing requirements instead of identifying problems and opportunities, seeking innovation.
Lack of support for research (no resources allocated to the team)
Lack of automated testing
Product Manager role not clearly defined and communicated in the organisation (what the role is about, what the responsibilities and objectives are, decisiveness)
Lack of user research
Roadmap focused on features instead of goals and business value

Machine Learning: The High-Interest Credit Card of Technical Debt D.Sculley et al (2014)

Machine learning offers a fantastically powerful toolkit for building complex systems quickly. This paper argues that it is dangerous to think of these quick wins as coming for free. Using the framework of technical debt, we note that it is remarkably easy to incur massive ongoing maintenance costs at the system level when applying machine learning.

Machine Learning is eating code - but that doesn’t mean it’s a free lunch. This paper highlights the challenges and types of technical debt you’re likely to encounter and under-estimate. ML has all of the complexities of normal code - but also a larger level system complexity (changing anything changes everything).

The Eighty Five Percent Rule for optimal learning Wilson, Shenhav, Straccia, Cohen (2019)

Researchers and educators have long wrestled with the question of how best to teach their clients be they humans, non-human animals or machines. Here, we examine the role of a single variable, the difficulty of training, on the rate of learning. In many situations we find that there is a sweet spot in which training is neither too easy nor too hard, and where learning progresses most quickly. We derive conditions for this sweet spot for a broad class of learning algorithms in the context of binary classification tasks. For all of these stochastic gradient-descent based learning algorithms, we find that the optimal error rate for training is around 15.87% or, conversely, that the optimal training accuracy is about 85%. We demonstrate the efficacy of this ‘Eighty Five Percent Rule’ for artificial neural networks used in AI and biologically plausible neural networks thought to describe animal learning.

Together with behaviour change, the ability to learn efficiently has always interested me. I’d heard of the Goldilocks learning zone before → that sweet spot of difficulty, not so hard that a problem is intractable and we lose motivation, not so easy that we’re going to succeed all the time and not progress. It’s interesting to see that appear in machine learning algorithms too. It aligns closely to what was seen in this study of classrooms.

The Science of Usability Testing Jean E. Fox (2015)

Usability testing has evolved from the stringent methods of experimental psychology, to less controlled, more qualitative tests, to the wide variety of methods used today. As the methods have evolved, researchers have studied many aspects of usability testing with the goal of better understanding how to best implement, plan, conduct, and interpret tests.

As well as the insights below, the paper gives you a feel for how usability testing has evolved over time.

Number of participants

There are diminishing returns to increasing participants. Around four to five participants can reveal about 80% of usability problems, with diminishing returns for additional participants.
You’ll need more participants if you’re testing a complex system, or if your user group is not heterogenous.
General guidelines suggest 5-10 participants per user group for qualitative tests and 20-30 for quantitative tests.

Number of Trained Observers

A single observer might miss issues that multiple evaluators could identify.
The number of trained observers (evaluators) significantly affects the identification of problems in usability testing. The lesson here is to involve the team.
Adding more evaluators can be as effective as adding more participants, especially when participant recruitment is challenging or time is limited.

Use of the Think-Aloud Method

The think-aloud method can influence test results; for example, it can make participants more aware of their thought processes, potentially leading to faster problem-solving.
Different approaches to think-aloud (e.g., traditional vs. relaxed methods) can have varying effects on aspects like task time and mental workload.
While think-aloud is a powerful tool for identifying usability problems, its implementation should be carefully considered, especially in tests focusing on quantitative measures.

The Magical Number Seven, Plus or Minus Two Miller (1956)

Miller proposed as a law of human cognition and information processing that humans can effectively process no more than seven units, or chunks, of information, plus or minus two pieces of information, at any given time. That limit applied to short-term memory and to a number of other cognitive processes, such as distinguishing different sound tones and perceiving objects at a glance.

Even if the science of 70 years ago isn’t perfect - the subject matter of this paper is becoming increasingly more relevant. The constraints of mobile phones and audio interfaces make information architecture increasingly important. I find ‘seven, plus or minus two’ to be a useful way to start a conversation about user journeys and information overload. Also - the title of this paper is one of the best examples of copywriting I can think of.

Applying behavioural insights to challenges in health policy Dominic King (2015)

Many of the more significant challenges we face in healthcare - such as reducing smoking, encouraging exercise and improving clinician adherence to evidence-based guidelines - will only be resolved if we are more successful at changing behaviours. The traditional tools used when thinking about influencing behaviour include legislation, regulation and information provision. Recently, interest has been shown in policies that ‘nudge’ people in particular directions; drawing on major advances in our understanding that behaviour is strongly influenced (in largely automatic ways) by the context and situation within which it is placed.

For a product to be successful, you often need to change people’s behaviour. Changing behaviour is hard, PMs can benefit from leveraging research in the this space. I found the list of ‘Mindspace’ effects worth considering.

Messenger - we are heavily influenced by who communicates information to us
Incentives - our responses to incentives are shaped by mental shortcuts (e.g. loss aversion)
Norms - we are strongly influenced by what others do
Defaults - we 'go with the flow' of pre-set options
Salience - our attention is drawn to what is novel and seems relevant to us
Priming - our acts are often influenced by subconscious cues
Affect - our emotional associations can powerfully shape our actions
Commitments - we seek to be consistent with our public promises, and reciprocate acts
Ego - we act in ways that make us feel better about ourselves.

The relationship between Recall and Precision Michael Buckland, Fredric Gey. (1994)

Empirical studies of retrieval performance have shown a tendency for Precision to decline as Recall increases. This article examines the nature of the relationship between Precision and Recall. The relationships between Recall and the number of documents retrieved, between Precision and the number of documents retrieved, and between Precision and Recall are described in the context of different assumptions about retrieval performance. It is demonstrated that a tradeoff between Recall and Precision is unavoidable whenever retrieval performance is consistently better than retrieval at random. More generally, for the Precision–Recall trade-off to be avoided as the total number of documents retrieved increases, retrieval performance must be equal to or better than overall retrieval performance up to that point. Examination of the mathematical relationship between Precision and Recall shows that a quadratic Recall curve can resemble empirical Recall–Precision behavior if transformed into a tangent parabola. With very large databases and/or systems with limited retrieval capabilities there can be advantages to retrieval in two stages: Initial retrieval emphasizing high Recall, followed by more detailed searching of the initially retrieved set, can be used to improve both Recall and Precision simultaneously. Even so, a tradeoff between Precision and Recall remains.

The precision-recall tradeoff is an important truth for both search and machine learning. Product managers need to understand the context in which their products or models are being used and strike the right balance.

Clinical reminder alert fatigue in healthcare: a systematic literature review protocol using qualitative evidence. Katarzyna Lewandowska et al (2020)

In conditions of intensive therapy, where the patients treated are in a critical condition, alarms are omnipresent. Nurses, as they spend most of their time with patients, monitoring their condition 24 h, are particularly exposed to so-called alarm fatigue.

Alarm fatigue is a real thing for healthcare practitioners. Be careful not to design your product in isolation, think about who will be using it and what their environment is like.